文書ストリームからのバースト潜在トピック抽出におけるt-LDA法の性能検証

1 0 0 0 文書ストリームからのバースト潜在トピック抽出におけるt-LDA法の性能検証

著者: 水田昌孝熊野雅仁小野景子木村昌弘
出版者: 情報処理学会
雑誌: 研究報告数理モデル化と問題解決(MPS) (ISSN:18840930)
巻号頁・発行日: vol.2010, no.10, pp.1-6, 2010-12-09

我々は以前に,文書ストリームからバースト潜在トピック抽出する t-LDA 法を提案した.t-LDA 法は,潜在トピックを抽出するために文書生成確率モデル LDA (Latent Dirichlet Allocation) を用い,バーストトピックを同定するために時間フィルタを導入している.そして,LDA と時間フィルタに基づいて,時間情報を持つ 2 つの文書間の類似度を構築し,階層的クラスタリング法を適用することで文書ストリームからバースト潜在トピックを抽出している.本稿では,人工データを用いた実験により t-LDA 法の定量的な有効性を検証し,オンラインニュースデータを用いた実験により t-LDA 法の有効性を実証する.We previously proposed the t-LDA method that extracts bursty latent topics from a documet stream. The method utilizes Latent Dirichlet Allocation (LDA), which is a probabilistic generative model of documents, for extracting latent topics, and introduce a time-filter for identifying bursty topics. It constructs a measure of similarity between two documents with time-stamps on the basis of LDA and the time-filter, and extract bursty latent topics from a document stream by applying a hierarchical agglomerative clustering method. In this paper, we quantitatively verify its effectiveness by using synthetic data, and demonstrate its effectiveness by using real online news data.

2014-01-19 16:05:01
1 はてなブックマーク

https://ci.nii.ac.jp/naid/40019909267

言及状況

はてなブックマーク (1 users, 1 posts)

[tech]

収集済み URL リスト

https://ci.nii.ac.jp/naid/40019909267 (1)