著者
Kohei Iijima Y-h.Taguchi
雑誌
研究報告バイオ情報学(BIO)
巻号頁・発行日
vol.2012, no.18, pp.1-2, 2012-06-21

Gene expression development during cell differentiation is a key factor to understand the mechanism of development. However, conventional gene expression analysis cannot distinguish among individual cell expression. In this paper, we re-analyze single cell gene expression measurements obtained by next gene sequencing technology during differentiation from mouse ES cell to MEF.Gene expression development during cell differentiation is a key factor to understand the mechanism of development. However, conventional gene expression analysis cannot distinguish among individual cell expression. In this paper, we re-analyze single cell gene expression measurements obtained by next gene sequencing technology during differentiation from mouse ES cell to MEF.
著者
小薮 駿 大川 剛直
雑誌
研究報告バイオ情報学(BIO)
巻号頁・発行日
vol.2012, no.15, pp.1-8, 2012-06-21

文献からのタンパク質相互作用情報抽出において,十分な量の訓練データが得られない場合,仮ラベル推定に基づく半教師あり学習が有効である.このようなタイプの半教師あり学習では仮ラベルを与える際に,誤ってラベルを付与することが精度低下の原因となるため,いかに正確に仮ラベルを付与するかが,極めて重要である.そこで本研究では、複数の分類器を用い,その共通コンセンサスを得る際に,分類器の類似度や学習手法の信頼度を導入することで正確な仮ラベル決定が可能となる手法を提案する。相互作用情報抽出実験の結果として,データセットが比較的大きな場合に,提案手法を用いることで,より精度の高い抽出が達成された.また従来手法との比較において,F 値と再現率では同等,もしくは少し劣る結果となったが,適合率の観点では提案手法が優位な結果を示すことが確認された.Semi-supervised learning based on tentative label prediction is a useful technique for automatic extraction of protein-protein interaction from litratures if enough training instances cannot be prepared. In such a framework of semi-supervised learning, how we predict the correct labels is very important for accurate extraction. In this paper, we propose a method of predicting tentative labels based on multiple classifiers introducing two types of measures for evaluating each classifier, similarity among the classifiers and reliability of the classifiers. As a result of experiment, the proposed method shows higher precision values for relatively large dataset, in comparison with conventiional methods.
著者
車谷 奈都実 大川 剛直
雑誌
研究報告バイオ情報学(BIO)
巻号頁・発行日
vol.2012, no.14, pp.1-6, 2012-06-21

蛋白質の機能と構造の関連を明らかにする上で,その立体構造を比較し,局所的に類似した部分を見つけることが重要である.本研究では蛋白質分子表面データを3次元画像へ変換し,そこから局所特徴点を検出して特徴量を算出することにより,蛋白質の局所構造間を比較する手法を提案する.提案手法を蛋白質の結合部位予測へ適用した結果,11 個中 6 個の結合部位の予測に成功することが示されており,その有効性を確認した.To explain the relationships between functions and structures of proteins, it is important to identify locally similar sites on protein molecular surfaces by comparing protein 3D structures. In this paper, we propose a method of comparing protein structures, in which the molecular surfaces are regarded as 3D images and the similarity between them is calculated by detecting keypoints from the images and computing local features at each keypoint. We applied the proposed method to prediction of protein's binding sites, which shows the accurate prediction of binding sites in six out of eleven proteins.
著者
Shigeharu Ishida Hideaki Umeyama Mitsuo Iwadate Y-h.Taguchi
雑誌
研究報告バイオ情報学(BIO)
巻号頁・発行日
vol.2012, no.12, pp.1-6, 2012-06-21

Drug discovery for autoimmune diseases is recently recognized to be an important task. In this study, we try to perform structure prediction of proteins whose gene promoter regions were previous reported to be specifically methelysed or de-methylased commonly for three autoimmune diseases, systemic lupus erythematosus, rheumatoid arthritis, and dermatomyositis. FAMS were employed for this purpose and we can predict three dimensional structure with significantly small enough P-values. Most of them are suggested to be self immunology related proteins and will be important drug target candidates. We also found some proteins which form complex with each other. The possibility of a new drug target, i.e., suppression of protein complex formation is suggested.Drug discovery for autoimmune diseases is recently recognized to be an important task. In this study, we try to perform structure prediction of proteins whose gene promoter regions were previous reported to be specifically methelysed or de-methylased commonly for three autoimmune diseases, systemic lupus erythematosus, rheumatoid arthritis, and dermatomyositis. FAMS were employed for this purpose and we can predict three dimensional structure with significantly small enough P-values. Most of them are suggested to be self immunology related proteins and will be important drug target candidates. We also found some proteins which form complex with each other. The possibility of a new drug target, i.e., suppression of protein complex formation is suggested.
著者
Tomoshige Ohno Shigeto Seno Yoichi Takenaka Hideo Matsuda
雑誌
研究報告バイオ情報学(BIO)
巻号頁・発行日
vol.2012, no.13, pp.1-7, 2012-06-21

Alternative splicing plays an important role in eukaryotic gene expression by producing diverse proteins from a single gene. Predicting how genes are transcribed is of great biological interest. To this end, massively parallel whole transcriptome sequencing, often referred to as RNA-Seq, is becoming widely used and is revolutionizing the cataloging isoforms using a vast number of short mRNA fragments called reads. Conventional RNA-Seq analysis methods typically align reads onto a reference genome (mapping) in order to capture the form of isoforms that each gene yields and how much of every isoform is expressed from an RNA-Seq dataset. However, a considerable number of reads cannot be mapped uniquely. Those so-called multireads that are mapped onto multiple locations due to short read length and analogous sequences inflate the uncertainty as to how genes are transcribed. This causes inaccurate gene expression estimations and leads to incorrect isoform prediction. To cope with this problem, we propose a method for isoform prediction by iterative mapping. The positions from which multireads originate can be estimated based on the information of expression levels, whereas quantification of isoform-level expression requires accurate mapping. These procedures are mutually dependent, and therefore remapping reads is essential. By iterating this cycle, our method estimates gene expression levels more precisely and hence improves predictions of alternative splicing. Our method simultaneously estimates isoform-level expressions by computing how many reads originate from each candidate isoform using an EM algorithm within a gene. To validate the effectiveness of the proposed method, we compared its performance with conventional methods using an RNA-Seq dataset derived from a human brain. The proposed method had a precision of 66.7% and outperformed conventional methods in terms of the isoform detection rate.Alternative splicing plays an important role in eukaryotic gene expression by producing diverse proteins from a single gene. Predicting how genes are transcribed is of great biological interest. To this end, massively parallel whole transcriptome sequencing, often referred to as RNA-Seq, is becoming widely used and is revolutionizing the cataloging isoforms using a vast number of short mRNA fragments called reads. Conventional RNA-Seq analysis methods typically align reads onto a reference genome (mapping) in order to capture the form of isoforms that each gene yields and how much of every isoform is expressed from an RNA-Seq dataset. However, a considerable number of reads cannot be mapped uniquely. Those so-called multireads that are mapped onto multiple locations due to short read length and analogous sequences inflate the uncertainty as to how genes are transcribed. This causes inaccurate gene expression estimations and leads to incorrect isoform prediction. To cope with this problem, we propose a method for isoform prediction by iterative mapping. The positions from which multireads originate can be estimated based on the information of expression levels, whereas quantification of isoform-level expression requires accurate mapping. These procedures are mutually dependent, and therefore remapping reads is essential. By iterating this cycle, our method estimates gene expression levels more precisely and hence improves predictions of alternative splicing. Our method simultaneously estimates isoform-level expressions by computing how many reads originate from each candidate isoform using an EM algorithm within a gene. To validate the effectiveness of the proposed method, we compared its performance with conventional methods using an RNA-Seq dataset derived from a human brain. The proposed method had a precision of 66.7% and outperformed conventional methods in terms of the isoform detection rate.
著者
Y-H.Taguchi Yoshiki Murakami
雑誌
研究報告バイオ情報学(BIO)
巻号頁・発行日
vol.2012-BIO-28, no.13, pp.1-6, 2012-03-21

Disease biomarker using blood is clinically important, since blood is easy to obtain from patients, thus it requires relatively less stress. However, blood generally reflects not only targeted diseases but also whole body status of patients. Thus, it is important which contents of blood are considered. Recently, miRNAs in blood, blood-borne miRNome, turns out to be promising candidates for blood based biomarker for diseases. In this paper, we propose a new method based upon principal component analysis to identify better candidates for miRNAs as blood based biomarker using miRNA expression profiles of patients. Our method based upon principal components analysis provides us better blood-borne miRNome to discriminate diseases from healthy controls. They are hsa-miR-425,hsa-miR-15b,hsa-miR-185, hsa-miR-92a, hsa-miR-140-3p, hsamiR-320a, hsa-miR-486-5p, hsa-miR-16, hsa-miR-191, hsa-miR-106b, hsa-miR-19b, and hsa-miR-30d and are previously extensively reported to be cancer/disease related miRNAs. We have found that these common miRNAs are expressive or suppressive significantly in most of diseases/cancers, but in diseases/cancers specific combinatory manner. It enables us to discriminate cancers/diseases from healthy control well.
著者
Y-HTaguchi
雑誌
研究報告バイオ情報学(BIO)
巻号頁・発行日
vol.2012-BIO-28, no.3, pp.1-6, 2012-03-21

MiRNAs are recently known to be critical players causing cell senescence, by regulating target genes. Thus inference of miRNAs critically regulating target genes is important. However, miRNAs critically regulating target genes are believed to have significant fold changes, typically upregulations, during cell senescence. In this study, we consider the target gene regulation by miRNAs together with miRNAs expression change during fibroblast IMR90 cell senescence. Then we found that the simultaneous consideration of two criterion lists more feasible miRNAs: i.e., miRNAs being more often reported to be down/upregulated and/or having biological backgrounds inducing cell senescence. Thus, the amount of target gene regulation, which can be inferred by the recently developed MiRaGE Server, is recommended to be considered together for the estimation of miRNAs critically contributing to cell senescence.
著者
Y-HTaguchi Akira Okamoto
雑誌
研究報告バイオ情報学(BIO)
巻号頁・発行日
vol.2012-BIO-28, no.16, pp.1-6, 2012-03-21

Proteomic analysis is very useful procedure to understand the bacterial behavior changes with reaction to the external environment. This is because most of genomic information of bacteria is devoted to code enzyme to control metabolic networks inside the individual cell. In this paper, we have performed proteomic analysis of Streptococcus pyogenes, which is known to be a flesh-eating bacteria and can cause several human life-threatening diseases. Its proteome during growth phase is measured for four time points under two different incubation conditions; with and without shaking. The purpose of it is to understand adaptivity to oxidative stress. Principal component analysis is applied and turns out to be useful to depict biologically important proteins for both supernatant and cell components.
著者
Y-h.Taguchi Kazunari Yokoyama
雑誌
研究報告バイオ情報学(BIO)
巻号頁・発行日
vol.2012-BIO-28, no.10, pp.1-6, 2012-03-21

Soil disease suppression is a worldwide important issue in order to realize stable food supply to people. In spite of that, no established indicators of soil disease suppression have been found out yet. This prevents us from controlling well soil state such that no diseases take place. In this paper, we have proposed a new biological indicator of soil disease suppression; the ability of bacteria to consume carbon resources, which can be automatically observed by Omunilog ID system during a duration of one or two days. This indicator turned out to distinguish disease suppressive soils from others. We have modeled these characteristic time developments of consumption of carbon resources by the simple ecological model where bacteria compete with each other for carbon resources. Measured ecological structure of soil bacteria can fit with the theoretical prediction well. In order to find characteristic features for each of soils, observed time developments are embedded into two dimensional space by non-metric multidimensional method. It results in the almost one dimensional arrangements of embedded points. By analyzing spacial distributions of each carbon resources in the embedded space, healthy soil turns out to have mostly uniform distribution along this one dimensional arrangement. Since sick soil and non-soil example have rather localized distributions, the ecological systems in more disease suppressive soil are both more diverse and more uniform. Since this indicator can be extremely easily and quickly obtained automatically, it is expected to use in order to validate many efforts to try to improve soil before any harvests very much.
著者
坂口 琢哉
雑誌
研究報告バイオ情報学(BIO)
巻号頁・発行日
vol.2011, no.24, pp.1-2, 2011-11-24

BBS の用途の一つに,TV 番組に対するコメントの書き込みによる,視聴体験の共有が挙げられる.本研究ではこうしたデータを収集し,類似度に基づいて自己組織化することで,代表的なコメントを抽出する手法を提案した.提案モデルを,プロ野球の試合中継に関するコメントデータに適用した結果,得点シーンなどの重要な場面において活性値の高いコメントを抽出でき,その有効性と動画要約への可能性が示された.It is one of the popular uses of BBS that we post some comments of TV programs while watching them, communicating and sharing experience with other viewers. In this study, we suggested a self-organization model for those comment data to obtain some typical comments. We applied the model to comment data for a baseball game to get some essential comments at the important scene of the game, showing its efficacy and availability for movie summarization.
著者
石田 武志
雑誌
研究報告バイオ情報学(BIO)
巻号頁・発行日
vol.2010-BIO-20, no.11, pp.1-8, 2010-02-25

細胞などの自己複製現象のメカニズムを数理学的に解明し一般論化することは,分子機械の量産や人工細胞の合成など様々な応用につながる.自己複製機械に関してはフォン・ノイマンが理論的な可能性を証明したが,ラングトンによる単純な形状での自己複製の実現に留まっている.本研究は 2 次元セルオートマトン上で,細胞型の形状の自己複製をシミュレーションし,細胞膜が構成され,細胞内の遺伝子的な情報コードが自己複製されていく現象を再現したものである.
著者
東野 正行 熊野 雅仁 木村 昌弘 斉藤 和巳
出版者
情報処理学会
雑誌
研究報告バイオ情報学(BIO) (ISSN:09196072)
巻号頁・発行日
vol.2009, no.51, pp.1-8, 2009-12-10

文書ストリームからホットトピック文書群を抽出する手法として,ネットワークコア抽出法である SR 法を拡張した手法を提案する.新聞記事ストリームデータ及び人工文書ストリームデータを用いた実験により,提案法は従来法よりも高精度であることを示す.We propose a method for extracting hot-topic documents in a document stream. The proposed method extends the SR-method for network-core extraction. Using real and synthetic document stream data, we experimentally demonstrate that the proposed method outperforms convetional methods.
著者
田中 翔 加藤 有己 関 浩之
出版者
一般社団法人情報処理学会
雑誌
研究報告バイオ情報学(BIO) (ISSN:09196072)
巻号頁・発行日
vol.2009, no.25, pp.37-40, 2009-02-26

シュードノットを含むRNAの2次構造予測に対するアプローチとして,文脈自由文法(CFG)より表現能力の高い形式文法(MCFQTAG等)の構文解析アルゴリズムに基づく手法が提案されている。また,汎用性と精度の向上を目指し,複数の1次構造同士の比較解析に基づく2次構造予測法もいくつか提案されている。本稿では,比較解析ができるようにMCFを拡張したペア確率多重文脈自由文法(Pair-SMCFG)を新たに定義し,これに基づくRNAの2次構造予測法を提案する。長さ70程度のRNA配列に対して2次構造予測を行ったところ,RNAの特定のファミリーに対する文法の特化を全く行わないという条件下であっても,適合率63.2%,再現率62.0%という結果を得た。Several methods for the prediction of RNA secondary structure including pseudoknots have been proposed based on parsing algorithms for formal grammars such as MCFG and TAG, of which generative power is greater than CFG. Also, comparative sequence analysis, which compares several RNAs and predicts their secondary structures, is a promissing approach. In this paper, we define pair-stochastic multiple context-free grammar (Pair-SMCFG) and propose a prediction method based on Pair-SMCFG. Pair-SMCFG is an extension of MCFG for comparative sequence analysis. Experimential results show that for RNA which have about 70 bases, the precision and recall of our algorithm are 63.2% and 62.0% respectively.