著者
So Nakagawa Toshiaki Katayama Lihua Jin Jiaqi Wu Kirill Kryukov Rise Oyachi Junko S Takeuchi Takatomo Fujisawa Satomi Asano Momoka Komatsu Jun-ichi Onami Takashi Abe Masanori Arita
出版者
The Genetics Society of Japan
雑誌
Genes & Genetic Systems (ISSN:13417568)
巻号頁・発行日
pp.23-00085, (Released:2023-10-14)
参考文献数
53

Since the early phase of the coronavirus disease 2019 (COVID-19) pandemic, a number of research institutes have been sequencing and sharing high-quality severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes to trace the route of infection in Japan. To provide insight into the spread of COVID-19, we developed a web platform named SARS-CoV-2 HaploGraph to visualize the emergence timing and geographical transmission of SARS-CoV-2 haplotypes. Using data from the GISAID EpiCoV database as of June 4, 2022, we created a haplotype naming system by determining the ancestral haplotype for each epidemic wave and showed prefecture- or region-specific haplotypes in each of four waves in Japan. The SARS-CoV-2 HaploGraph allows for interactive tracking of virus evolution and of geographical prevalence of haplotypes, and aids in developing effective public health control strategies during the global pandemic. The code and the data used for this study are publicly available at: https://github.com/ktym/covid19/.
著者
Takashi Okubo Takahiro Tsukui Hiroko Maita Shinobu Okamoto Kenshiro Oshima Takatomo Fujisawa Akihiro Saito Hiroyuki Futamata Reiko Hattori Yumi Shimomura Shin Haruta Sho Morimoto Yong Wang Yoriko Sakai Masahira Hattori Shin-ichi Aizawa Kenji V. P. Nagashima Sachiko Masuda Tsutomu Hattori Akifumi Yamashita Zhihua Bao Masahito Hayatsu Hiromi Kajiya-Kanegae Ikuo Yoshinaga Kazunori Sakamoto Koki Toyota Mitsuteru Nakao Mitsuyo Kohara Mizue Anda Rieko Niwa Park Jung-Hwan Reiko Sameshima-Saito Shin-ichi Tokuda Sumiko Yamamoto Syuji Yamamoto Tadashi Yokoyama Tomoko Akutsu Yasukazu Nakamura Yuka Nakahira-Yanaka Yuko Takada Hoshino Hideki Hirakawa Hisayuki Mitsui Kimihiro Terasawa Manabu Itakura Shusei Sato Wakako Ikeda-Ohtsubo Natsuko Sakakura Eli Kaminuma Kiwamu Minamisawa
出版者
Japanese Society of Microbial Ecology / Japanese Society of Soil Microbiology / Taiwan Society of Microbial Ecology / Japanese Society of Plant Microbe Interactions / Japanese Society for Extremophiles
雑誌
Microbes and Environments (ISSN:13426311)
巻号頁・発行日
pp.1203230372, (Released:2012-03-28)
参考文献数
1
被引用文献数
37 53

Bradyrhizobium sp. S23321 is an oligotrophic bacterium isolated from paddy field soil. Although S23321 is phylogenetically close to Bradyrhizobium japonicum USDA110, a legume symbiont, it is unable to induce root nodules in siratro, a legume often used for testing Nod factor-dependent nodulation. The genome of S23321 is a single circular chromosome, 7,231,841 bp in length, with an average GC content of 64.3%. The genome contains 6,898 potential protein-encoding genes, one set of rRNA genes, and 45 tRNA genes. Comparison of the genome structure between S23321 and USDA110 showed strong colinearity; however, the symbiosis islands present in USDA110 were absent in S23321, whose genome lacked a chaperonin gene cluster (groELS3) for symbiosis regulation found in USDA110. A comparison of sequences around the tRNA-Val gene strongly suggested that S23321 contains an ancestral-type genome that precedes the acquisition of a symbiosis island by horizontal gene transfer. Although S23321 contains a nif (nitrogen fixation) gene cluster, the organization, homology, and phylogeny of the genes in this cluster were more similar to those of photosynthetic bradyrhizobia ORS278 and BTAi1 than to those on the symbiosis island of USDA110. In addition, we found genes encoding a complete photosynthetic system, many ABC transporters for amino acids and oligopeptides, two types (polar and lateral) of flagella, multiple respiratory chains, and a system for lignin monomer catabolism in the S23321 genome. These features suggest that S23321 is able to adapt to a wide range of environments, probably including low-nutrient conditions, with multiple survival strategies in soil and rhizosphere.
著者
Yasuhiro TANIZAWA Takatomo FUJISAWA Eli KAMINUMA Yasukazu NAKAMURA Masanori ARITA
出版者
BMFH出版会
雑誌
Bioscience of Microbiota, Food and Health (ISSN:21863342)
巻号頁・発行日
vol.35, no.4, pp.173-184, 2016 (Released:2016-10-28)
参考文献数
49
被引用文献数
188

Quality assurance and correct taxonomic affiliation of data submitted to public sequence databases have been an everlasting problem. The DDBJ Fast Annotation and Submission Tool (DFAST) is a newly developed genome annotation pipeline with quality and taxonomy assessment tools. To enable annotation of ready-to-submit quality, we also constructed curated reference protein databases tailored for lactic acid bacteria. DFAST was developed so that all the procedures required for DDBJ submission could be done seamlessly online. The online workspace would be especially useful for users not familiar with bioinformatics skills. In addition, we have developed a genome repository, DFAST Archive of Genome Annotation (DAGA), which currently includes 1,421 genomes covering 179 species and 18 subspecies of two genera, Lactobacillus and Pediococcus, obtained from both DDBJ/ENA/GenBank and Sequence Read Archive (SRA). All the genomes deposited in DAGA were annotated consistently and assessed using DFAST. To assess the taxonomic position based on genomic sequence information, we used the average nucleotide identity (ANI), which showed high discriminative power to determine whether two given genomes belong to the same species. We corrected mislabeled or misidentified genomes in the public database and deposited the curated information in DAGA. The repository will improve the accessibility and reusability of genome resources for lactic acid bacteria. By exploiting the data deposited in DAGA, we found intraspecific subgroups in Lactobacillus gasseri and Lactobacillus jensenii, whose variation between subgroups is larger than the well-accepted ANI threshold of 95% to differentiate species. DFAST and DAGA are freely accessible at https://dfast.nig.ac.jp.
著者
Eli Kaminuma Yukino Baba Masahiro Mochizuki Hirotaka Matsumoto Haruka Ozaki Toshitsugu Okayama Takuya Kato Shinya Oki Takatomo Fujisawa Yasukazu Nakamura Masanori Arita Osamu Ogasawara Hisashi Kashima Toshihisa Takagi
出版者
The Genetics Society of Japan
雑誌
Genes & Genetic Systems (ISSN:13417568)
巻号頁・発行日
pp.19-00034, (Released:2020-03-26)
参考文献数
37
被引用文献数
3

Recently, the prospect of applying machine learning tools for automating the process of annotation analysis of large-scale sequences from next-generation sequencers has raised the interest of researchers. However, finding research collaborators with knowledge of machine learning techniques is difficult for many experimental life scientists. One solution to this problem is to utilise the power of crowdsourcing. In this report, we describe how we investigated the potential of crowdsourced modelling for a life science task by conducting a machine learning competition, the DNA Data Bank of Japan (DDBJ) Data Analysis Challenge. In the challenge, participants predicted chromatin feature annotations from DNA sequences with competing models. The challenge engaged 38 participants, with a cumulative total of 360 model submissions. The performance of the top model resulted in an area under the curve (AUC) score of 0.95. Over the course of the competition, the overall performance of the submitted models improved by an AUC score of 0.30 from the first submitted model. Furthermore, the 1st- and 2nd-ranking models utilised external data such as genomic location and gene annotation information with specific domain knowledge. The effect of incorporating this domain knowledge led to improvements of approximately 5%–9%, as measured by the AUC scores. This report suggests that machine learning competitions will lead to the development of highly accurate machine learning models for use by experimental scientists unfamiliar with the complexities of data science.