著者
Hiroto SAIGO Hisashi KASHIMA Koji TSUDA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E96.D, no.8, pp.1766-1773, 2013-08-01 (Released:2013-08-01)
参考文献数
22

Apriori-based mining algorithms enumerate frequent patterns efficiently, but the resulting large number of patterns makes it difficult to directly apply subsequent learning tasks. Recently, efficient iterative methods are proposed for mining discriminative patterns for classification and regression. These methods iteratively execute discriminative pattern mining algorithm and update example weights to emphasize on examples which received large errors in the previous iteration. In this paper, we study a family of loss functions that induces sparsity on example weights. Most of the resulting example weights become zeros, so we can eliminate those examples from discriminative pattern mining, leading to a significant decrease in search space and time. In computational experiments we compare and evaluate various loss functions in terms of the amount of sparsity induced and resulting speed-up obtained.
著者
Eli Kaminuma Yukino Baba Masahiro Mochizuki Hirotaka Matsumoto Haruka Ozaki Toshitsugu Okayama Takuya Kato Shinya Oki Takatomo Fujisawa Yasukazu Nakamura Masanori Arita Osamu Ogasawara Hisashi Kashima Toshihisa Takagi
出版者
The Genetics Society of Japan
雑誌
Genes & Genetic Systems (ISSN:13417568)
巻号頁・発行日
pp.19-00034, (Released:2020-03-26)
参考文献数
37
被引用文献数
3

Recently, the prospect of applying machine learning tools for automating the process of annotation analysis of large-scale sequences from next-generation sequencers has raised the interest of researchers. However, finding research collaborators with knowledge of machine learning techniques is difficult for many experimental life scientists. One solution to this problem is to utilise the power of crowdsourcing. In this report, we describe how we investigated the potential of crowdsourced modelling for a life science task by conducting a machine learning competition, the DNA Data Bank of Japan (DDBJ) Data Analysis Challenge. In the challenge, participants predicted chromatin feature annotations from DNA sequences with competing models. The challenge engaged 38 participants, with a cumulative total of 360 model submissions. The performance of the top model resulted in an area under the curve (AUC) score of 0.95. Over the course of the competition, the overall performance of the submitted models improved by an AUC score of 0.30 from the first submitted model. Furthermore, the 1st- and 2nd-ranking models utilised external data such as genomic location and gene annotation information with specific domain knowledge. The effect of incorporating this domain knowledge led to improvements of approximately 5%–9%, as measured by the AUC scores. This report suggests that machine learning competitions will lead to the development of highly accurate machine learning models for use by experimental scientists unfamiliar with the complexities of data science.
著者
Takuya Kuwahara Yukino Baba Hisashi Kashima Takeshi Kishikawa Junichi Tsurumi Tomoyuki Haga Yoshihiro Ujiie Takamitsu Sasaki Hideki Matsushima
出版者
Information Processing Society of Japan
雑誌
Journal of Information Processing (ISSN:18826652)
巻号頁・発行日
vol.26, pp.306-313, 2018 (Released:2018-03-15)
参考文献数
17
被引用文献数
18

Modern vehicles are equipped with Electronic Control Units (ECUs) and external communication devices. The Controller Area Network (CAN), a widely used communication protocol for ECUs, does not have a security mechanism to detect improper packets; if attackers exploit the vulnerability of an ECU and manage to inject a malicious message, they are able to control other ECUs to cause improper operation of the vehicle. With the increasing popularity of connected cars, it has become an urgent matter to protect in-vehicle networks against security threats. In this paper, we study the applicability of statistical anomaly detection methods for identifying malicious CAN messages in in-vehicle networks. We focus on intrusion attacks of malicious messages. Because the occurrence of an intrusion attack certainly influences the message traffic, we focus on the number of messages observed in a fixed time window to detect intrusion attacks. We formalize features to represent a message sequence that incorporates the number of messages associated with each receiver ID. We collected CAN message data from an actual vehicle and conducted a quantitative analysis of the methods and the features in practical situations. The results of our experiments demonstrated our proposed methods provide fast and accurate detection in various cases.
著者
Hisashi KASHIMA Tsuyoshi IDE Tsuyoshi KATO Masashi SUGIYAMA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E92-D, no.7, pp.1338-1353, 2009-07-01
被引用文献数
16

Kernel methods such as the support vector machine are one of the most successful algorithms in modern machine learning. Their advantage is that linear algorithms are extended to non-linear scenarios in a straightforward way by the use of the kernel trick. However, naive use of kernel methods is computationally expensive since the computational complexity typically scales cubically with respect to the number of training samples. In this article, we review recent advances in the kernel methods, with emphasis on scalability for massive problems.
著者
Tetsuji Kuboyama Kouichi Hirata Hisashi Kashima Kiyoko F.Aoki-Kinoshita Hiroshi Yasuda
出版者
The Japanese Society for Artificial Intelligence
雑誌
Transactions of the Japanese Society for Artificial Intelligence (ISSN:13460714)
巻号頁・発行日
vol.22, no.2, pp.140-147, 2007 (Released:2007-01-25)
参考文献数
17
被引用文献数
5 11 27

Learning from tree-structured data has received increasing interest with the rapid growth of tree-encodable data in the World Wide Web, in biology, and in other areas. Our kernel function measures the similarity between two trees by counting the number of shared sub-patterns called tree q-grams, and runs, in effect, in linear time with respect to the number of tree nodes. We apply our kernel function with a support vector machine (SVM) to classify biological data, the glycans of several blood components. The experimental results show that our kernel function performs as well as one exclusively tailored to glycan properties.