著者
Masashi Tsubaki Masashi Shimbo Yuji Matsumoto
出版者
一般社団法人 情報処理学会
雑誌
IPSJ Transactions on Bioinformatics (ISSN:18826679)
巻号頁・発行日
vol.10, pp.2-8, 2017 (Released:2017-01-20)
参考文献数
31
被引用文献数
3

Predicting the 3D structure of a protein from its amino acid sequence is an important challenge in bioinformatics. Since directly predicting the 3D structure is hard to achieve, classifying a protein into one of the “folds”, which are pre-defined structural labels in protein databases such as SCOP and CATH, is generally used as an intermediate step to determine the 3D structure. This classification task is called protein fold recognition (PFR), and much research has addressed the problem of either (i) feature extractions from amino acid sequences or (ii) classification methods of the protein folds. In this paper, we propose a new approach for PFR with (i) learning feature representations with unsupervised methods from a large protein database instead of manual feature selection and using external tools. (ii) learning deep neural architectures, recurrent neural networks (RNNs) with long short-term memory (LSTM) units, and re-training the representations instead of fixing the extracted features. On a benchmark dataset, our approach outperforms existing methods that use various physicochemical features.