著者
Ryo Harada Keitaro Kume Kazumasa Horie Takuro Nakayama Yuji Inagaki Toshiyuki Amagasa
出版者
Information Processing Society of Japan
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.16, pp.20-27, 2023 (Released:2023-07-25)
参考文献数
48

Eukaryotic genomes contain exons and introns, and it is necessary to accurately identify exon-intron boundaries, i.e., splice sites, to annotate genomes. To address this problem, many previous works have proposed annotation methods/tools based on RNA-seq evidence. Many recent works exploit neural networks (NNs) as their prediction models, but only a few can be used to generate new genome annotation in practice. In this study, we propose AtLASS, a fully automated method for predicting splice sites from genomic and RNA-seq data using attention-based Bi-LSTM (Bidirectional Long Short-Term Memory). We exploit two-stage training on RNA-seq data to address the problem of biased label problem, thereby reducing the false positives. The experiments on the genomes of three species show that the performance of the proposed method itself is comparable to that of existing methods, but we can achieve better performance by combining the outputs of the proposed method and the existing method. The proposed method is the first program specialized in end-to-end splice site prediction using NNs.
著者
Hiroyoshi ITO Takahiro KOMAMIZU Toshiyuki AMAGASA Hiroyuki KITAGAWA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E102.D, no.4, pp.810-820, 2019-04-01 (Released:2019-04-01)
参考文献数
30
被引用文献数
1

Multi-attributed graphs, in which each node is characterized by multiple types of attributes, are ubiquitous in the real world. Detection and characterization of communities of nodes could have a significant impact on various applications. Although previous studies have attempted to tackle this task, it is still challenging due to difficulties in the integration of graph structures with multiple attributes and the presence of noises in the graphs. Therefore, in this study, we have focused on clusters of attribute values and strong correlations between communities and attribute-value clusters. The graph clustering methodology adopted in the proposed study involves Community detection, Attribute-value clustering, and deriving Relationships between communities and attribute-value clusters (CAR for short). Based on these concepts, the proposed multi-attributed graph clustering is modeled as CAR-clustering. To achieve CAR-clustering, a novel algorithm named CARNMF is developed based on non-negative matrix factorization (NMF) that can detect CAR in a cooperative manner. Results obtained from experiments using real-world datasets show that the CARNMF can detect communities and attribute-value clusters more accurately than existing comparable methods. Furthermore, clustering results obtained using the CARNMF indicate that CARNMF can successfully detect informative communities with meaningful semantic descriptions through correlations between communities and attribute-value clusters.