著者
Ryohei Eguchi Naoaki Ono Hisayuki Horai Md.Altuf-Ul Amin Aki Morita Hirai Jun Kawahara Shoji Kasahara Tomoaki Endo Shigehiko Kanaya
出版者
日本化学会 情報化学部会
雑誌
Journal of Computer Aided Chemistry (ISSN:13458647)
巻号頁・発行日
vol.18, pp.58-75, 2017 (Released:2017-08-01)
参考文献数
86
被引用文献数
4

Systematic representation of alkaloid biosynthetic pathways based on ring skeletons has been proposed because the skeleton nucleus of an alkaloid is the main criterion for determination in biosynthetic pathways. So the idea of ring skeletons was extended to apply classification of alkaloid compounds based on ring skeletons and to systematize alkaloid compounds and to examine the performance of this approach to predict biosynthetic pathways based on module elements. We constructed a 2-dimensional binary matrix corresponding to 2546 SRS and 478 pathway-known alkaloid compounds. Here, if ith substring skeleton is present in a target compound, the ith element was set to 1; otherwise, the ith element was set to 0. Relationship of alkaloid compounds with biosynthetic pathways are examined based on the dendrogram produced by Ward clustering method to the matrix. Of 12,243 alkaloid compounds accumulated in KNApSAcK Core DB (http://kanaya.naist.jp/knapsack_jsp/top.html), 3,124 compounds (25.5 %) correspond to the pathway-known ring skeletons (187 ring skeletons), but the remaining 9,119 (74.5%) compounds do not. By examining the sub-ring skeleton similarity of the remaining compounds, it might be possible to obtain clues of pathway information and systemization of all alkaloid compounds. Therefore, the present work focuses on comprehensive systematization of the alkaloid compounds and construction principles of ring skeletons in alkaloids based on subring skeleton profiling.
著者
Christin Rakers Daniel Reker J.B. Brown
出版者
日本化学会 情報化学部会
雑誌
Journal of Computer Aided Chemistry (ISSN:13458647)
巻号頁・発行日
vol.18, pp.124-142, 2017 (Released:2017-08-01)
参考文献数
72
被引用文献数
14

The identification of new compound-protein interactions has long been the fundamental quest in the field of medicinal chemistry. With increasing amounts of biochemical data, advanced machine learning techniques such as active learning have been proven to be beneficial for building high-performance prediction models upon subsets of such complex data. In a recently published paper, chemogenomic active learning had been applied to the interaction spaces of kinases and G protein-coupled receptors featuring over 150,000 compound-protein interactions. Prediction models were actively trained based on random forest classification using 500 decision trees per experiment. In a new direction for chemogenomic active learning, we address the question of how forest size influences model evolution and performance. In addition to the original chemogenomic active learning findings that highly predictive models could be constructed from a small fraction of the available data, we find here that that model complexity as viewed by forest size can be reduced to one-fourth or one-fifth of the previously investigated forest size while still maintaining reliable prediction performance. Thus, chemogenomic active learning can yield predictive models with reduced complexity based on only a fraction of the data available for model construction.