文献一覧: 情報処理学会論文誌データベース(TOD) (雑誌)

1978 0 0 0 マイクロブログの投稿時間に着目したユーザの職業推定に関する研究

著者: 田中成典中村健二加藤諒寺口敏生
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.6, no.5, pp.71-84, 2013-12-27

マイクロブログから特定の話題に対するユーザの反応を取得する技術が研究されている.マイクロブログをソーシャルセンサとして有効活用するには,ユーザごとの特性を知る必要がある.しかし,マイクロブログでは,ユーザが属性を公開していない場合が多々あるため,ユーザごとの特性を把握できない.このことから,マイクロブログのユーザ属性を推定する研究が注目されている.しかし,既存手法では,主にマイクロブログの投稿内容にのみ着目しており,リアルタイムに発信されるマイクロブログの特性を属性推定に活かせていない.そこで,本研究では,各単位時間の投稿数に基づきユーザをクラスタリングし,投稿内容,生活習慣と投稿時間帯から職業属性を推定する手法を提案する.実証実験では,投稿内容のみを使用して推定する既存手法と,時間的特徴をも考慮する本手法について比較実験を行い,本提案手法の有用性を確認した.Research is being conducted on technology to get users' reactions to specific topics in microblogs. It is necessary to know the users' characteristics in order to effectively utilize microblogs as social sensors. However, it cannot understand the users' characteristics, because user attributes are not often to the public in microblogs. For this reason, research on estimating user attributes in microblogs has been drawing attention. However, existing methods, which merely focus on the description contents in microblogs, do not take advantage of the characteristics in microblogs that transmit in real time to estimate users' attributes. This research proposes a method for classifying the users according to number of posts per unit time and estimating the occupation attributes by description contents, lifestyle and time zone of posts. Our demonstration experiments verify usability of the proposed method by comparing the existing methods of estimating merely using description contents with the proposed method of estimating using description contents and temporal characteristics.

https://ci.nii.ac.jp/naid/110009656663

86 0 0 0 OA マイクロブログの投稿時間に着目したユーザの職業推定に関する研究

著者: 田中成典中村健二加藤諒寺口敏生
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.6, no.5, pp.71-84, 2013-12-27

http://id.nii.ac.jp/1001/00096964/

57 0 0 0 深層学習を用いた新物質探索に関するサーベイ

著者: 奥野智也佐々木勇和鈴木雄太
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.13, no.3, pp.22-31, 2020-07-16

所望の物理化学的な性質を持つ新たな物質の探索は化学,創薬,物質・材料科学などの分野において重要な課題である.従来のアプローチは研究者の勘や経験に大きく依存し,また時間的なコストが高いという問題がある.そのため,探索の効率化を目的として,機械学習やデータマイニングなどの情報科学の技術を取り入れた研究がさかんに行われている.近年では深層学習技術を用いた高精度化が進んでいる.そこで,本稿では新物質探索における深層学習技術を網羅的に調査し体系的にまとめることを目的とする.新物質の探索技術を(1)物質構造からその性質を識別する分類と回帰技術,および(2)性質から物質を導出する生成技術に大別し,それぞれの技術の適用分野,データの分類,および深層学習のモデルについて述べる.さらに,既存技術の制約や問題点を述べ,今後の課題を明確にする.

2020-07-23 06:43:11
57 + 46 Twitter

http://id.nii.ac.jp/1001/00206155/

12 0 0 0 OA くだけた表現を高精度に解析するための正規化ルール自動生成手法

著者: 池田和史柳原正松本一則滝嶋康弘
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.3, no.3, pp.68-77, 2010-09-28

ブログ上の文書には口語的な表現や特有の表記などのくだけた表現が多数含まれるため,一般の形態素解析器を用いても十分な解析精度を得ることはできない.くだけた表現は人手により辞書登録されることが一般的であるが,人的コストの大きさや専門的な知識を必要とすることが課題である.本稿ではくだけた表現を正規な表現に修正することで高精度な形態素解析を実現する手法を提案する.提案手法ではくだけた表現の修正候補文字列をくだけた表現の少ない文書から自動的に検索し,修正ルールを生成する.生成した多数の修正ルールから文脈に適した修正ルールを選択的に適用するために,検索結果における修正候補文字列の出現頻度,修正前後の文字列間における編集距離,修正前後の文の形態素解析結果の比較,を用いて修正ルールをスコアリングする手法を合わせて提案する.提案手法と従来手法の性能比較評価実験を行い,各手法における未知語の出現率や単語区切りの正確さ,修正前後の文の意味変化を定量的に評価した.提案手法では従来手法と同程度の単語区切りの正確さを維持しながら,対象文章の未知語出現数を 36.1% 減少させることに成功した.これは従来手法における未知語減少数の 2.5 倍以上である.

http://id.nii.ac.jp/1001/00070540/

12 0 0 0 Web検索エンジンのインデックスを用いた同位語とそのコンテキストの発見

著者: 大島裕明小山聡田中克己
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.47, no.19, pp.98-112, 2006-12-15
被引用文献数: 15

本研究では,ユーザが与えた1語のクエリに対して,Web検索エンジンが持つ情報のみから同位語とそのコンテキストを発見する手法について提案する.同位語とは,共通の上位語を持つような語のことである.従来研究として,同位語や,上位語,下位語などを求めるような研究は数多くあるが,それらはWeb上の文書を利用するものも含めて,巨大なコーパスを解析して大量の結果を求めるというものであった.我々の提案する手法では,Web文書のタイトルやスニペットといったWeb検索エンジンが持つ情報のみを,少ない回数のWeb検索によって取得し,それらを解析して同位語を発見する.提案手法では,ある語に対する同位語は並列助詞「や」で接続されることを利用してWeb検索エンジンに対するクエリを作成して,その検索結果のみから同位語を得る.そこでは何の事前準備も必要なく,また,あらゆる分野の語に対して同位語を発見することができる.さらに,発見された同位語とクエリの語の背後にあるコンテキストも同時に取得する.このような同位語発見は,Web検索におけるクエリ拡張や想起支援や,何かを調べるにあたって他のものと比較したいときの比較対象の発見など,幅広い分野で利用することができると考えられる.We propose a method of using only a Web search engine index to discover coordinate terms, i.e., terms that have the same hypernym. Several research methods acquire coordinate terms, but they require huge corpora or many Web pages. Our proposed method uses only the information in a Web search engine index such as titles and snippets of Web pages. These are obtained by a few Web searches, and then they are parsed to discover coordinate terms. We focus attention on coordinate terms that are connected by the coordinating particle "ya," and use those to make queries for a Web search engine. Our method does not require any preprocessing, and can find coordinate terms for terms in any field. At the same time, we find the background context between a query term and each discovered coordinate term. Such a service for discovering coordinate terms can be used in any field for such purposes as query expansion, word remembrance support system, or finding comparable objects.

https://ci.nii.ac.jp/naid/110006160105

12 0 0 0 マイクロブログから抽出したユーザの習慣に基づく行動推定に関する研究

著者: 田中成典中村健二寺口敏生中本聖也加藤諒
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.6, no.3, pp.73-89, 2013-06-28

携帯端末の普及にともない,ユーザの状況に応じて様々な情報をリアルタイムに提供するサービスに注目が集まっている.そのため,GPSから取得した位置情報や,マイクロブログの投稿内容からユーザの行動を推定する研究が行われている.著者らは,これらに加えて,新たにユーザの習慣的な行動に着目した推定手法について検討を行った.本研究では,マイクロブログにおけるユーザの投稿内容と投稿数の変化から行動のパターンを抽出し,指定した時間帯における習慣的な行動を推定する手法を提案する.この手法により,マイクロブログの投稿内容には行動に関する記述がない場合でも,指定した時間帯におけるユーザの行動を推定できる.実証実験では,投稿内容のみを用いた手法と習慣行動もあわせて考慮する本手法とを比較し,提案手法の有用性について検証した.Services to provide variety of information in real time with reference to users' situations are receiving attention, as portable terminals have become widespread. Accordingly, some studies are being made to estimate the users' activities from their location information obtained by GPS or from the contents of their microblog posts. In addition to these, the author and his colleagues examined a new estimation approach focused on the habitual behavior of users. The present study proposes an estimation method of users' habitual behavior within designated periods of time by extracting behavioral patterns from the changes in contents and numbers of their posts in microblogs. This method enables estimation of users' behavior within designated time periods without any behavioral description provided in their microblog posts. Our demonstration experiments compare a method of merely using posted contents with this method that also considers habitual behavior as well, and verify its usability.

https://ci.nii.ac.jp/naid/110009579667

10 0 0 0 Social Bookmarkにおけるコンテンツクラスタ間の類似度を用いたwebコンテンツ推薦システム

著者: 佐々木祥宮田高道稲積泰宏小林亜樹酒井善則
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.48, no.20, pp.14-27, 2007-12-15
参考文献数: 16
被引用文献数: 4

近年急速に普及しているソーシャルブックマークは,ユーザ間でブックマークを共有できるサービスであり,新たな情報収集ツールとして注目されている.ソーシャルブックマークでは,ユーザはwebコンテンツにタグと呼ばれる自由記述のキーワードを付与できるため,既存研究においてタグの名称に着目したwebコンテンツ推薦システムが提案されている.しかしながら,ユーザの嗜好はタグの名称ではなく,タグを表象として関連付けられたwebコンテンツ群(以下,コンテンツクラスタ)として表出するものといえる.そこで本研究では,コンテンツクラスタ間の類似度を仮説検定問題として求め,得られた類似度に基づくwebコンテンツ推薦システムを提案する.また,提案手法の検証実験によって,付与するタグの名称が他のユーザと異なるユーザに対しても有効に推薦することが可能であるなどの有効性を確認することができた.The web-based bookmark management service called social bookmark has recently been in the spotlight and come to be recognized as a new information sharing tool.Social bookmark service allow users to tag keywords to each of their entries. These keywords are called 'tags'. There are some conventional studies of the web content recommendation system based on social bookmark which is focused to actual words of tags. However, the essential information of tags is not tag names, but classification of web contents by tags (we called the result of this classifications as contents cluster). Based on this assumption, we calculate similarities between contents clusters by using hypotheses test. By using calculated similarities, we proposed the web content recommendation system based on these similarities. It has been shown that our proposed method is working well, as the fact that appropriate recommendation can be offered to users, including who tagged different named tags to the same contents.

https://ci.nii.ac.jp/naid/110006533398

9 0 0 0 MPEG符号化情報に基づく類似シーン検出方式

著者: 片岡良治遠藤斉
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.41, no.3, pp.37-45, 2000-05-15
参考文献数: 11
被引用文献数: 8

本論文では,MPEG符号化情報から直接的に求められる映像の特徴情報を用いて,映像に含まれる類似シーンを精度良く検出する方式について述べる.ここでいう類似シーンとは,例えば野球中継の映像に含まれる個々のホームランシーンのように,理論的に同じ意味を持つが物理的な構成が異なるシーンを指す.類似シーンを精度良く検出できれば,それらに一括して同じタグ情報を付与できるようになり,映像データベースのインデクシング作業の効率化が図れる.提案法は,特にスポーツ映像へのタグ付け処理の効率化を狙いとしており,スポーツ映像の類似シーン共通するカメラワークの存在に着目してシーン検出を行う.また,音声認識の分野で提案された連続DPマッチングをカメラワーク情報の照合処理に適用することで,類似シーン毎のシーン長の違いに柔軟に対応する.実際の野球中継の映像を用いて実験した結果,提案法は従来法よりも高い適合率と再現率を提供できることが明らかとなった.This paper describes a similar detection method using feature information directly obtained from MPEG encoded video data. Scenes are regarded as similar ones when they have the same logical meaning while each of them conteins sifferent physical data. For instance, all home run scenes in a baseball program have the same logical meaning of "home run"while each of them contains different image data. Similar scene detection is effective for eliminating trouble in making an index of a video databese since it makes it possible to assign the same keyword to all derected scenes at once. The proposed method derects similar scenes based on their camera work similarity. Its main application is sports scene detection since similar sports scenes are generelly captured with the same camera work. To cope with the difference of scene length among similar scence, it adopts the continuous DP matching algorithm to compare camera work features obtained from MPEG encoded video data. It is evaluated using a broadcastung baseball prigram. The results show that it can provide higher precision and recall rates than traditional methods.

https://ci.nii.ac.jp/naid/110002725457

9 0 0 0 トピックモデルに基づく大規模ネットワークの重複コミュニティ発見

著者: 野沢健人若林啓
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.9, no.2, pp.1-10, 2016-06-29

グラフ構造におけるコミュニティ発見手法は,ソーシャルメディアや共著関係,商品の購買データなどから機能的・構造的にまとまりをもったノード群を抽出し分析することを可能にする重要な技術である.特に近年では,非常に大規模なグラフを解析する機会が多くなってきているため,グラフの規模に対してスケーラブルなコミュニティ発見手法が求められている.本研究では,あるノードからの距離が一定以下のノードの集合を文書と見なしてトピックモデルを学習し,トピックごとのノードの予測分布を用いてコミュニティ発見を行う手法について論じたうえで,トピックモデルの学習に確率的変分ベイズ法を適用することで,データの規模に対して高いスケーラビリティを持つ重複コミュニティ発見手法を提案する.実験により,提案手法は6,000万ノード,18億エッジからなる大規模ネットワークに対しても,既存手法と比較して高速なコミュニティ発見を実現できることを示す.

http://id.nii.ac.jp/1001/00165254/

7 0 0 0 ツイート投稿位置推定のための単語フィルタリング手法

著者: 森國泰平吉田光男岡部正幸梅村恭司
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.8, no.4, pp.16-26, 2015-12-28

ツイートに含まれる特徴と位置情報を対応させることで,実世界を観測するセンサとしてTwitterを活用することができる.しかし位置情報が付加されたツイートは少なく,Twitterをセンサとして活用するときの問題の1つとなる.そこで本研究では,ツイートの投稿位置を推定し,より多くのツイートに正確な位置情報を付与することを目的とする.この目的を達成するために,ツイート中のノイズとなる単語を除去するためのフィルタリング手法を提案する.また,単語の地理的分布を平滑化するためのスムージング手法も提案する.これらの提案手法が従来手法よりも有効に機能することを示し,その考察を行う.

2021-03-25 18:42:15
7 + 17 Twitter

http://id.nii.ac.jp/1001/00146951/

6 0 0 0 OA アウトオブオーダ型クエリ実行に基づくプラグイン可能なデータベースエンジン加速機構

著者: 早水悠登合田和生喜連川優
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.7, no.2, pp.104-116, 2014-06-30

アウトオブオーダ型クエリ実行とは,動的タスク分解と非同期入出力発行に基づくクエリ実行方式であり,従前の同期入出力発行・逐次処理に基づく実行方式と比べて,大規模データに対する選択的クエリ実行において高い性能を発揮することが知られている.本論文では,既存データベースエンジンにおけるクエリ実行の挙動を変えることなく,その処理性能をアウトオブオーダ型クエリ実行と同水準まで向上させるために,アウトオブオーダ型クエリ実行に基づくデータベースエンジン加速機構を提案する.当該機構は既存エンジンのクエリ実行と並行して当該クエリを協調的にアウトオブオーダ型実行し,バッファプールを介して先行的にデータベースページを供給することで,既存エンジンの入出力待ち時間を縮減し,大幅な高速化を実現する.本論文ではオープンソースデータベース管理システムPostgreSQLを対象とした加速機構の試作実装PgBoosterの構成法を示すとともに,ミッドレンジ級のサーバ・ディスクストレージからなる環境において評価実験を行い,その高速性を明らかにする.

2018-09-19 13:51:48
6 + 7 Twitter

http://id.nii.ac.jp/1001/00101890/

6 0 0 0 状況依存型ユーザ嗜好モデリングに基づくContext-Aware情報推薦システム

著者: 奥健太中島伸介宮崎純植村俊亮
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.48, no.11, pp.162-176, 2007-06-15
参考文献数: 24
被引用文献数: 12

本論文では,ユーザの状況に応じて適切な情報を提供する状況依存型情報推薦システムのプロトタイプを提案する.膨大な情報からユーザの嗜好に合致する情報を提供する手法として,情報推薦システムに関する研究が行われているが,ユーザのそのときの状況(時間帯や天気,同伴者,予算など)に応じて変化するユーザの嗜好に対し,柔軟に対応することは容易ではない.そこで我々は,状況に応じて変化するユーザの嗜好を適切にモデル化する手法を提案した.本論文では,このモデル化手法を適用した状況依存型情報推薦システムのプロトタイプを提案し,検証実験に基づいて,提案手法の評価を行った.この中で,提案手法であるコンテクスト依存型情報フィルタリングとコンテクスト依存型協調フィルタリングの有効性や特長の違いを明らかにするとともに,対象コンテンツの特徴パラメータの最適化に関して考察した.This paper proposes a prototype of a context-aware recommendation system which provides users with contents appropriate to their contexts. There have been many studies of recommendation systems which provide users with contents suited to their preferences. However, it is not easy to adapt these recommendations to changing contexts (e.g., time of day, weather, companions and budget). Thus, we propose a method which takes into account users' preferences as well as their current context. This paper also introduces a prototype of a context-aware recommendation system using the method, and evaluates our proposed methods. In our experiments, we explore the strengths and weaknesses both Context-Aware Information Filtering and Context-Aware Collaborative Filtering. Lastly, we discuss optimization of the feature parameters of contents.

https://ci.nii.ac.jp/naid/110006317693

6 0 0 0 N.M-gram:ハッシュ値付きN-gram索引による全文検索の一手法

著者: 平林幹雄江渡浩一郎
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.48, no.7, pp.29-37, 2007-03-15

全文検索システムの転置索引を実現するにあたり,テキストデータからN-gram法によって切り出したトークンを検索キーにする手法が広く用いられている.この手法には,言語中立性や再現率の完全性という利点がある反面,索引ファイルのサイズが肥大化して空間効率が悪化するという欠点がある.検索の際にクエリから切り出した各トークンが対象文書のテキスト内でも連接しているかどうかを判断するためには,索引ファイル内にトークンの文書内での出現位置を記録しておくことが必要となるが,この位置情報が索引ファイルの肥大化の一因となっている.本稿では,N-gram法の欠点である索引ファイルの空間効率を改善する手法として,N.M-gram法を提案する.N.M-gram法では,各トークンの文書内での位置情報のかわりに後続のトークンのハッシュ値を用いることによって,N-gram法の利点である言語中立性や再現率の完全性を保持したまま,空間効率を改善することができる.When constructing inverted index for full-text search system, using N-gram is very popular for tokenizing text data of target documents. Although the method has many advantages like language neutrality and perfect recall ratio, it has also shortage that the index file becomes large. The tokens extracted from documents tend to be enormous. The system needs to record each offset of tokens into the index file because the offset is used for checking adjacency of tokens. The index file tends to be large because of the offset. In this paper, we describe N.M-gram method, which improves space efficiency of N-gram. The method uses hash values of succeeding tokens instead of offset in each document. The method can improve space efficiency without losing advantages of N-gram.

https://ci.nii.ac.jp/naid/110006242981

4 0 0 0 OA ウェブアクセスリテラシー尺度の開発

著者: 山本祐輔山本岳洋大島裕明川上浩司
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.12, no.1, pp.24-37, 2019-01-16

本稿では,ウェブ検索エンジンなどの情報アクセスシステムを用いて情報を精査し,正確なウェブ情報を収集する能力「ウェブアクセスリテラシー」を測定する尺度と質問紙を提案する.提案したリテラシー尺度の信頼性,妥当性の評価を行うために,クラウドソーシングサービスを用いて534名のウェブユーザにオンライン調査を行った.因子分析の結果,ウェブアクセスリテラシー尺度は7因子構造であった.また,ウェブアクセスリテラシー尺度の総合得点は,当該尺度と関連すると考えられる健康リテラシー尺度得点と弱い正の相関(r=0.32,p<.001)を,ウェブ情報に対する信用度と弱い負の相関(r=-0.20,p<.001)を示した.さらに,情報リテラシー関係の講義の受講経験別にウェブアクセスリテラシー尺度得点を確認したところ,統計的有意差が確認された(F(1, 525) = 8.82,p<.01).信頼性を示すクロンバックのα係数については,6つの因子は0.8以上,1因子については0.76であった.

http://id.nii.ac.jp/1001/00193701/

4 0 0 0 Tweet分析による群衆行動を用いた地域特徴抽出

著者: 李龍若宮翔子角谷和俊
出版者: 情報処理学会
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.5, no.2, pp.36-52, 2012-06-29

都市を特徴付けることは,人々が日々の生活や様々な活動の中で行っている情報処理過程の一部であり,都市での意思決定を行う際に重要となる.従来は,物理的な構造形態に基づく都市の機能や都市に対する人々の意識といった都市の外観に基づく特徴付けが行われていたが,実際に都市で生活する人々の活動を支援するためには,人々のライフスタイルを中心とした特徴付けが重要となる.都市における人々の活動は多様であり容易に把握することは困難であるが,近年のソーシャル・ネットワーク(SNS)の発達とスマートフォンの普及により,多くのユーザが実空間における活動や感情を自らの居場所の位置情報とともに自発的に発信するようになり,実世界の物理的空間と密接に関連した位置ベースSNSを通して,都市における群衆のライフスタイルを把握することが可能になっている.本研究では,位置ベースSNSに蓄積されている大量のユーザの時空間ライフログを用いて都市空間における群衆行動をモニタリングし,都市の地域特徴を抽出する手法を提案する.具体的には,Twitterに投稿されているジオタグ付きTweetsを用いてモニタリングした群衆行動をベクトル化し,地域と群衆行動特徴によって構成した行列を分析することで,特徴的な行動パターンとそれに対応する都市を抽出する手法を提案する.実験では,Twitterから取得した大量のジオタグ付きTweetsを用いて群衆行動を分析し抽出した地域特徴の意味付けを行うために,Yahoo! ロコが提供している店舗や施設のジャンルを調査した結果について示す.Characterizing urban space is critical to understand the space and conduct decision-makings in our daily urban lives and activities. Conventional methods have attempted to characterize urban space using urban functionalities based on physical configuration and people's conscious mind to the space. However, in order to support residents' activities in urban space, it is essential to extract characteristics focusing on crowd's urban lifestyles. However, it is a non-trivial work to monitor crowd activities and lifestyles in large-scale regions. In order to solve this problem, we can exploit current crowd's power on behalf of the proliferation of smartphones as well as the recent development of location-based social networks, where massive users voluntarily share their lifelogs and thoughts together with their whereabouts. Therefore, we can easily monitor crowd behavior through such location-based social networks. In this work, we propose a method to characterize urban space in terms of crowd behavior by utilizing enormous number of users' spatio-temporal lifelogs archived over social networks. Specifically, we derive latent classes of urban characteristics in terms of crowd behavioral patterns and relevant urban areas which are extracted using geo-tagged Tweets over Twitter.

https://ci.nii.ac.jp/naid/110009419836

4 0 0 0 Wikipediaのリンク共起性解析によるシソーラス辞書構築

著者: 伊藤雅弘中山浩太郎原隆浩西尾章治郎
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.48, no.20, pp.39-49, 2007-12-15
被引用文献数: 2

近年,知識処理の有用なコーパスとして,ユーザ同士が協調してコンテンツを編集するWeb事典である「Wikipedia」に多大な注目が集まっている.筆者らはこれまでの研究において,Wikipediaに対してリンク構造を解析することで精度の良いシソーラス辞書が構築できることを示してきた.しかし,膨大な記事数を持つWikipediaを解析するためには,高い精度を保ったままスケーラビリティのさらなる向上が技術的な課題であった.そこで,本研究ではリンクの共起性解析に着目し,スケーラビリティの高いシソーラス辞書構築手法を提案する.提案手法の性能評価のために行った実験の結果,共起性解析を用いた手法は従来手法よりも少ない計算時間で,高精度なシソーラス辞書を構築できることを確認した.さらに,共起性解析とtfidfを融合させることによって,より高い精度が実現できることを確認した.Wikipedia, a huge scale Web based encyclopedia, attracts great attention as a valuable corpus for knowledge extraction. We have already proved how effective it is to construct a Web thesaurus. However, we still need high scalability methods to analyze the huge amount of Web pages and hyper links among articles in the encyclopedias. In this paper, we propose a scalable Web thesaurus construction method from Wikipedia by using link co-occurrence. Experimental results show that the proposed method based on link co-occurrence analysis was better on scalability and accuracy than previous methods. Moreover, the method combining tfidf with link co-occurrence analysis brought higher precision.

https://ci.nii.ac.jp/naid/110006533400

3 0 0 0 ソーシャルブックマークの特性分析とそれに基づくWeb 検索結果の再ランキング手法

著者: 山家雄介中村聡史アダムヤトフト田中克己
出版者: 情報処理学会
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.1, no.1, pp.88-100, 2008-06-26
被引用文献数: 2

ブログなどの普及により情報発信の裾野が広がるにつれて,Web 検索結果から有用なページを発見するのは困難になる一方である.最近ではユーザのブックマーク行動を集約することによって価値のあるページを抽出する,ソーシャルブックマークのような取り組みがさかんになりつつある.本稿では,ソーシャルブックマークにおけるページのブックマーク数などの情報を用いて,検索結果のページの内より有用なものを上位に提示する再ランキング手法を提案する.次に,提案手法を多数のクエリに対して適用し,検索結果に含まれるページの順位変動率や,ページの種類などを調査・分類し,どのような検索目的に本アプローチが有効なのかを明らかにした.With the rise of blogs and other web applications, it is getting easier and easier to publish information. At the same time, it is getting more difficult to discern informative pages from web search results. Recently social bookmarking systems, which discover valuable pages by aggregating the bookmarking activities of many users, are getting popular. In this paper, we introduce a re-ranking method for web search results that makes use of the number of bookmarks registered with a social bookmarking service. Then, we apply our method to a number of search queries, analyze and classify characteristics of the search results, and make it clear what kind of search can be performed effectively by the method.

https://ci.nii.ac.jp/naid/110007990004

3 0 0 0 OA Tweet分析による群衆行動を用いた地域特徴抽出

著者: 李龍若宮翔子角谷和俊
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.5, no.2, pp.36-52, 2012-06-29

http://id.nii.ac.jp/1001/00082756/

3 0 0 0 IR 抜粋による複数文書要約を評価するためのコーパスと評価指標

著者: 平尾努奥村学福島孝博難波英嗣野畑周磯崎秀樹
出版者: 社団法人情報処理学会
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.48, no.14, pp.60-68, 2007-09-15
参考文献数: 17
被引用文献数: 1

複数文書要約の対象となる文書群には,ある文に対して,意味的に似通った文やまったく同じ文が含まれていることが多い.こうした傾向は,要約のための文書群を複数の情報源から得た場合に特に顕著である.しかし,従来のコーパスには,このようなよく似た文,あるいは同一の文の間に注釈付けが存在しない.これは,抜粋を評価するための指標を定義するうえで致命的な問題となる.本稿では,こうした冗長性を考慮したコーパスへの注釈付けの枠組みを提案し,それに基づき,抜粋の情報量を測る指標である被覆率,抜粋に含まれる重要文の冗長度を測る指標である重要文冗長率を提案する.これらの指標による抜粋の順位付けと被験者による順位付けとの間の順位相関係数は,ともに0.7以上であり,人間の順位付けとの間に高い相関があることが分かった.In multiple document summarization, input documents have many similar (or even identical)sentences. However, conventional corpora for multiple document summarization do not include links between similar sentences. This is a critical problem with regard to the definition of evaluation measures for sentence extraction. In this paper, we propose both annotation scheme for corpus and evaluation measures, "coverage" and "redundancy." "Coverage" measures the content information of the system extract and "redundancy" measures the redundancy of the important sentences contained in system extract. We evaluate "coverage" and "redundancy" by comparing their ranking correlation coefficients with subjective human rankings. The results show that both measure attained enough high correlation coefficients, which were more than 0.7 correlation coefficients.

https://ci.nii.ac.jp/naid/110006390951

3 0 0 0 半構造データのためのデータモデルと操作言語

著者: 田島敬史
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会論文誌データベース(TOD) (ISSN:18827799)
巻号頁・発行日: vol.40, no.3, pp.152-170, 1999-02-15
参考文献数: 54
被引用文献数: 26

本論文ではここ数年行われている半構造データのためのデータモデルと操作言語に関する研究について概観し主な研究についての比較と考察を行う. またこれらの研究は従来のオブジェクト指向データベースやハイパーテキストに関する研究とも関係している. そこでこれらの研究との比較も行う. これらの比較から本論文では半構造データのデータモデルおよび操作言語を設計する上で特に重要な点は以下の二点であると考える. まず一点目はデータモデルの設計の段階でいわゆる従来の意味での「データ」と従来のデータモデルでのスキーマ情報にあたるデータとを区別無く扱えるようにするのが望ましいという点である. 二点目は操作言語はデータベース中のデータ構造の一部分を抜き出す狭義の「問い合わせ」操作だけでなくデータベース中のデータを再構成するような操作が表現できるべきでありそのためにはなんらかのポインタの操作のための機構が必要になるという点である. また今後の半構造データに関する研究の展望についても簡単に述べる.In this paper, we survey, compare, and discuss the recent proposals on data models and query languages for semistructured data. These researches are also related to researches on object-oriented databases and hypertexts in the past. The comparison with those researches are also made. From those discussions, we consider that the following two points are key in the design of data models and query languages for semistructured data. First, a data model for semistructured data should model both "data" in the traditional sense and data corresponding to schema information in the traditional data model in a uniform way. Secondly, a query language for semistructured data should be able to express not only "selecting queries", which extract substructure from a database, but also "restructuring queries", which transform the structure of a database into another structure. To express restructuring query, functionalities for pointer manipulation are needed. In the last part of this paper, we also discuss a prospect on the future researches on semistructured data.

https://ci.nii.ac.jp/naid/110002724886

««
«
1
2
3
4
5
6
7
8
»
»»