著者
Hideki Kakeya Yoshihisa Matsumoto
出版者
Information Processing Society of Japan
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.15, pp.22-29, 2022 (Released:2022-11-16)
参考文献数
35

A method to find a probability that a given bias of mutations occur naturally is proposed to test whether a newly detected virus is a product of natural evolution or a product of non-natural process such as genetic manipulation. The probability is calculated based on the neutral theory of molecular evolution and binominal distribution of non-synonymous (N) and synonymous (S) mutations. Though most of the conventional analyses, including dN/dS analysis, assume that any kinds of point mutations from a nucleotide to another nucleotide occurs with the same probability, the proposed model takes into account the bias in mutations, where the equilibrium of mutations is considered to estimate the probability of each mutation. The proposed method is applied to evaluate whether the Omicron variant strain of SARS-CoV-2, whose spike protein includes 29 N mutations and only one S mutation, can emerge through natural evolution. The result of binomial test based on the proposed model shows that the bias of N/S mutations in the Omicron spike can occur with a probability of 2.0 × 10-3 or less. Even with the conventional model where the probabilities of any kinds of mutations are all equal, the strong N/S mutation bias in the Omicron spike can occur with a probability of 3.7 × 10-3, which means that the Omicron variant is highly likely a product of non-natural process including artifact.
著者
Nobuaki Yasuo Keisuke Watanabe Hideto Hara Kentaro Rikimaru Masakazu Sekijima
出版者
Information Processing Society of Japan
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.11, pp.41-47, 2018 (Released:2018-12-10)
参考文献数
36
被引用文献数
5

Lead optimization is an essential step in drug discovery in which the chemical structures of compounds are modified to improve characteristics such as binding affinity, target selectivity, physicochemical properties, and toxicity. We present a concept for a computational compound optimization system that outputs optimized compounds from hit compounds by using previous lead optimization data from a pharmaceutical company. In this study, to predict the drug-likeness of compounds in the evaluation function of this system, we evaluated and compared the ability to correctly predict lead optimization strategies through learning to rank methods.
著者
Hiroaki Tanaka Yu Suzuki Shotaro Yamasaki Koichiro Yoshino Ko Kato Satoshi Nakamura
出版者
Information Processing Society of Japan
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.11, pp.14-23, 2018 (Released:2018-07-05)
参考文献数
43
被引用文献数
1

Protein production in plants is a hot topic because there are many benefits relative to bacteria, yeasts, and animals, but the amount of protein expression in plants is less. It is argued that editing 5'UTRs increases the amount of translated proteins. However, obtaining such 5'UTRs is difficult due to the cost, time and effort required in experiments. To solve this, we predict the amount of translated proteins by machine learning. In this paper, we propose a method, named “R-STEINER, ” that generates 5'UTRs that increase the amount of proteins of a given gene. The proposed process involves building a model for predicting the amount of translated proteins, generating 5'UTRs, selecting them and increasing the proteins according to the model. This method enables us to obtain 5'UTRs that increase the amount of translated proteins without real synthesis experiments, resulting in reduced cost, time and effort. In our study, we built a prediction model for Oryza sativa and synthesized the 5'UTRs generated by R-STEINER. We confirmed that the model can predict the amount of translated proteins with a correlation coefficient of 0.89.
著者
Ryo Harada Keitaro Kume Kazumasa Horie Takuro Nakayama Yuji Inagaki Toshiyuki Amagasa
出版者
Information Processing Society of Japan
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.16, pp.20-27, 2023 (Released:2023-07-25)
参考文献数
48
被引用文献数
1

Eukaryotic genomes contain exons and introns, and it is necessary to accurately identify exon-intron boundaries, i.e., splice sites, to annotate genomes. To address this problem, many previous works have proposed annotation methods/tools based on RNA-seq evidence. Many recent works exploit neural networks (NNs) as their prediction models, but only a few can be used to generate new genome annotation in practice. In this study, we propose AtLASS, a fully automated method for predicting splice sites from genomic and RNA-seq data using attention-based Bi-LSTM (Bidirectional Long Short-Term Memory). We exploit two-stage training on RNA-seq data to address the problem of biased label problem, thereby reducing the false positives. The experiments on the genomes of three species show that the performance of the proposed method itself is comparable to that of existing methods, but we can achieve better performance by combining the outputs of the proposed method and the existing method. The proposed method is the first program specialized in end-to-end splice site prediction using NNs.
著者
Masashi Tsubaki Masashi Shimbo Yuji Matsumoto
出版者
一般社団法人 情報処理学会
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.10, pp.2-8, 2017 (Released:2017-01-20)
参考文献数
31
被引用文献数
3

Predicting the 3D structure of a protein from its amino acid sequence is an important challenge in bioinformatics. Since directly predicting the 3D structure is hard to achieve, classifying a protein into one of the “folds”, which are pre-defined structural labels in protein databases such as SCOP and CATH, is generally used as an intermediate step to determine the 3D structure. This classification task is called protein fold recognition (PFR), and much research has addressed the problem of either (i) feature extractions from amino acid sequences or (ii) classification methods of the protein folds. In this paper, we propose a new approach for PFR with (i) learning feature representations with unsupervised methods from a large protein database instead of manual feature selection and using external tools. (ii) learning deep neural architectures, recurrent neural networks (RNNs) with long short-term memory (LSTM) units, and re-training the representations instead of fixing the extracted features. On a benchmark dataset, our approach outperforms existing methods that use various physicochemical features.
著者
Ayako Ohshiro Hitoshi Afuso Takeo Okazaki Morikazu Nakamura
出版者
一般社団法人 情報処理学会
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.10, pp.9-15, 2017 (Released:2017-03-29)
参考文献数
27

Various de novo assembly methods based on the concept of k-mer have been proposed. Despite the success of these methods, an alternative approach, referred to as the hybrid approach, has recently been proposed that combines different traditional methods to effectively exploit each of their properties in an integrated manner. However, the results obtained from the traditional methods used in the hybrid approach depend not only on the specific algorithm or heuristics but also on the selection of a user-specific k-mer size. Consequently, the results obtained with the hybrid approach also depend on these factors. Here, we designed a new assembly approach, referred to as the rule-based assembly. This approach follows a similar strategy to the hybrid approach, but employs specific rules learned from certain characteristics of draft contigs to remove any erroneous contigs and then merges them. To construct the most effective rules for this purpose, a learning method based on decision trees, i.e., a complex decision tree, is proposed. Comparative experiments were also conducted to validate the method. The results showed that proposed method could outperformed traditional methods in certain cases.
著者
Yuki Endo Fubito Toyama Chikafumi Chiba Hiroshi Mori Kenji Shoji
出版者
一般社団法人 情報処理学会
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.8, pp.2-8, 2015 (Released:2015-01-27)
参考文献数
14

Sequencing the whole genome of various species has many applications, not only in understanding biological systems, but also in medicine, pharmacy, and agriculture. In recent years, the emergence of high-throughput next generation sequencing technologies has dramatically reduced the time and costs for whole genome sequencing. These new technologies provide ultrahigh throughput with a lower per-unit data cost. However, the data are generated from very short fragments of DNA. Thus, it is very important to develop algorithms for merging these fragments. One method of merging these fragments without using a reference dataset is called de novo assembly. Many algorithms for de novo assembly have been proposed in recent years. Velvet and SOAPdenovo2 are well-known assembly algorithms, which have good performance in terms of memory and time consumption. However, memory consumption increases dramatically when the size of input fragments is larger. Therefore, it is necessary to develop an alternative algorithm with low memory usage. In this paper, we propose an algorithm for de novo assembly with lower memory. In our experiments using E.coli K-12 strain MG 1655 and human chromosome 14, the memory consumption of our proposed algorithm was less than that of other popular assemblers.
著者
Hideaki Umeyama Mitsuo Iwadate Y-h. Taguchi
出版者
一般社団法人 情報処理学会
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.8, pp.14-20, 2015 (Released:2015-08-19)
参考文献数
31

Background: Spleen tyrosine kinase (SYK) is a protein related to various diseases. Aberrant SYK expression often causes the progression and initiation of several diseases including cancer and autoimmune diseases. Despite the importance of inhibiting SYK and identifying candidate inhibitors, no clinically effective inhibitors have been reported to date. Therefore, there is a need for novel SYK inhibitors. Results: Candidate compounds were investigated using in silico screening by chooseLD, which simulates ligand docking to proteins. Using this system, known inhibitors were correctly recognized as compounds with high affinity to SYK. Furthermore, many compounds in the DrugBank database were newly identified as having high affinity to the ATP-binding sites in the kinase domain with a similar affinity to previously reported inhibitors. Conclusions: Many drug candidate compounds from the DrugBank database were newly identified as inhibitors of SYK. Because compounds registered in the DrugBank are expected to have fewer side effects than currently available compounds, these newly identified compounds may be clinically useful inhibitors of SYK for the treatment of various diseases.
著者
Yuuichi Nakano Mitsuo Iwadate Hideaki Umeyama Y-h. Taguchi
出版者
一般社団法人 情報処理学会
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.7, pp.2-15, 2014 (Released:2014-01-17)
参考文献数
15
被引用文献数
1

Type III secretion system (T3SS) effector protein is a part of bacterial secretion systems. T3SS exists in the pathogenic and symbiotic bacteria. How the T3SS effector proteins in these two classes differ from each other should be interesting. In this paper, we successfully discriminated T3SS effector proteins between plant pathogenic, animal pathogenic and plant symbiotic bacteria based on feature vectors inferred computationally by Yahara et al. only from amino acid sequences. This suggests that these three classes of bacteria employ distinct T3SS effector proteins. We also hypothesized that the feature vector proposed by Yahara et al. represents protein structure, possibly protein folds defined in Structural Classification of Proteins (SCOP) database.
著者
Junko Sato Kouji Kozaki Susumu Handa Takashi Ikeda Ryotaro Saka Kohei Tomizuka Yugo Nishiyama Toshiyuki Okumura Shinichi Hirai Tadashi Ohno Mamoru Ohta Susumu Date Haruki Nakamura
出版者
一般社団法人 情報処理学会
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.6, pp.9-17, 2013 (Released:2013-05-28)
参考文献数
13

We developed a new information management system, Protein Experimental Information Management System (PREIMS), which has the ontology-based functions for quality control, validation, scalability, and information sharing. Its contents are mainly experimental protocols for the analyses of protein structures and functions, and their results. They are stored separately in the PREIMS database (DB), as the ontology based protocol data and the result data. The synchrotron experimental information was stored as the latter result data in Extensible Markup Language (XML). Furthermore we converted those protocols in the format of Resource Description Framework (RDF) for integration with other biological information resources.
著者
Wisnu Ananta Kusuma Takashi Ishida Yutaka Akiyama
出版者
Information Processing Society of Japan
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.4, pp.21-33, 2011 (Released:2011-11-04)
参考文献数
27
被引用文献数
2 2

De novo DNA sequence assembly is very important in genome sequence analysis. In this paper, we investigated two of the major approaches for de novo DNA sequence assembly of very short reads: overlap-layout-consensus (OLC) and Eulerian path. From that investigation, we developed a new assembly technique by combining the OLC and the Eulerian path methods in a hierarchical process. The contigs yielded by these two approaches were treated as reads and were assembled again to yield longer contigs. We tested our approach using three real very-short-read datasets generated by an Illumina Genome Analyzer and four simulated very-short-read datasets that contained sequencing errors. The sequencing errors were modeled based on Illumina's sequencing technology. As a result, our combined approach yielded longer contigs than those of Edena (OLC) and Velvet (Eulerian path) in various coverage depths and was comparable to SOAPdenovo, in terms of N50 size and maximum contig lengths. The assembly results were also validated by comparing contigs that were produced by assemblers with their reference sequence from an NCBI database. The results show that our approach produces more accurate results than Velvet, Edena, and SOAPdenovo alone. This comparison indicates that our approach is a viable way to assemble very short reads from next generation sequencers.
著者
Tasuku Okui
出版者
Information Processing Society of Japan
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.13, pp.1-6, 2020 (Released:2020-01-08)
参考文献数
26
被引用文献数
5

Microbiome data have been obtained relatively easily in recent years, and currently, various methods for analyzing microbiome data are being proposed. Latent Dirichlet allocation (LDA) models, which are frequently used to extract latent topics from words in documents, have also been proposed to extract information on microbial communities for microbiome data. To extract microbiome topics associated with a subject's attributes, LDA models that utilize supervisory information, including LDA with Dirichlet multinomial regression (DMR topic model) or supervised topic model (SLDA, ) can be applied. Further, a Bayesian nonparametric model is often used to automatically decide the number of latent classes for a latent variable model. An LDA can also be extended to a Bayesian nonparametric model using the hierarchical Dirichlet process. Although a Bayesian nonparametric DMR topic model has been previously proposed, it uses normalized gamma process for generating topic distribution, and it is unknown whether the number of topics can be automatically decided from data. It is expected that the total number of topics (with relatively large proportions) can be restricted to a smaller value using the stick-breaking process for generating topic distribution. Therefore, we propose a Bayesian nonparametric DMR topic model using a stick-breaking process and have compared it to existing models using two sets of real microbiome data. The results showed that the proposed model could extract topics that were more associated with attributes of a subject than existing methods, and it could automatically decide the number of topics from the data.
著者
Makito Oku
出版者
Information Processing Society of Japan
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.12, pp.9-16, 2019 (Released:2019-03-25)
参考文献数
16
被引用文献数
3

In this paper, I propose two novel methods for extracting synchronously fluctuated genes (SFGs) from a transcriptome data. Variability and synchrony in biological signals are generally considered to be associated with the system's stability in some sense. However, a standard method for extracting SFGs from a transcriptome data with high reproducibility has not been established. Here, I propose two novel methods for extracting SFGs. The first method has two steps: selection of remarkably fluctuated genes and extraction of synchronized gene clusters. The other method is based on principal component analysis. It has been confirmed that the two methods have high extraction performance for artificial data and a moderate level of reproducibility for real data. The proposed methods will help to extract candidate genes related to the stability and homeostasis in living organisms.
著者
Keisuke Yanagisawa Takashi Ishida Yutaka Akiyama
出版者
一般社団法人 情報処理学会
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.8, pp.21-27, 2015 (Released:2015-08-19)
参考文献数
21

It is necessary to confirm that a new drug can be appropriately cleared from the human body. However, checking the clearance pathway of a drug in the human body requires clinical trials, and therefore requires large cost. Thus, computational methods for drug clearance pathway prediction have been studied. The proposed prediction methods developed previously were based on a supervised learning algorithm, which requires clearance pathway information for all drugs in a training set as input labels. However, these data are often insufficient in its numbers because of the high cost of their acquisition. In this paper, we propose a new drug clearance pathway prediction method based on semi-supervised learning, which can use not only labeled data but also unlabeled data. We evaluated the effectiveness of our method, focusing on the cytochrome P450 2C19 enzyme, which is involved in one of the major clearance pathways.
著者
Nobuaki Yasuo Masakazu Sekijima
出版者
一般社団法人 情報処理学会
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.8, pp.9-13, 2015 (Released:2015-07-08)
参考文献数
25
被引用文献数
1

We developed a new application to quantitatively evaluate the sequence conservation of ligand-binding sites by integrating information pertaining to protein structures, ligand-binding sites, and amino acid sequences. These data are visualized onto protein structures via a Jmol or PyMOL interface. The visualization is very important for structure-based drug design (SBDD). Key features of this application are the visualization of slight differences in specific ligand-binding sites and ConservationScore comparable among ligand-binding sites. Furthermore, we conducted an experiment to visualize the calculation and comparison of the ConservationScore of four viral proteins as well as an experiment to visualize the differences between proteins belonging to the human β adrenergic receptor family. This application is available at http://www.bio.gsic.titech.ac.jp/visco.html.
著者
Hisamitsu Akiba Y-h. Taguchi
出版者
一般社団法人 情報処理学会
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.1, pp.35-41, 2008 (Released:2008-11-28)
参考文献数
12

The Barcode of Life (BOL) project aims to identify species with no other information than DNA sequence. We assume that BOL includes information on higher taxa. In the present study, we compute nonmetric distance from BOL barcodes by using rank order of pairwise distance for 3 distinct examples, namely, Ant Diversity in Northern Madagascar, Survey of Chelicerates, and Birds of North America. This enables us to recognize higher taxa, i.e., genus, family, and order, more easily. For example, the ratio of mean inner taxa nonmetric distance to the intertaxa distance is smaller than that for raw (metric) distance. Furthermore, for most pairs of higher taxa, the mean intertaxa distance is more than twice larger than intrataxa distances. The nonmetric multidimensional scaling method enables to discriminate higher taxa compared to tree construction by the neighbor-joining method or the maximum parsimony method with raw distance measure, when each species is embedded into more than 40 dimensional space with an accuracy of 90% even after leave-one-out-cross-validation.