著者
佐々木 健太 長野 伸一 長 健太 川村 隆浩
出版者
人工知能学会
雑誌
人工知能学会全国大会論文集 (ISSN:13479881)
巻号頁・発行日
vol.25, 2011

近年,ライフストリームと呼ばれるWebデータ(Twitter,ブログなど)が注目を浴びている.ところが,これらにはユーザが実際に体験した出来事以外に,単なる事実や感想なども多く含まれる.そこで,本研究ではライフストリームから,ユーザが主体的に行った行動に関する情報のみを抽出する手法を提案する.そして,Wikipedia上のデータとの比較による手法評価について述べる.
著者
グェン ミンティ 川村 隆浩 中川 博之 田原 康之 大須賀 昭彦
出版者
一般社団法人 人工知能学会
雑誌
人工知能学会論文誌 (ISSN:13460714)
巻号頁・発行日
vol.26, no.1, pp.166-178, 2011 (Released:2011-01-06)
参考文献数
31
被引用文献数
2

In our definition, human activity can be expressed by five basic attributes: actor, action, object, time and location. The goal of this paper is describe a method to automatically extract all of the basic attributes and the transition between activities derived from sentences in Japanese web pages. However, previous work had some limitations, such as high setup costs, inability to extract all attributes, limitation on the types of sentences that can be handled, and insufficient consideration interdependency among attributes. To resolve these problems, this paper proposes a novel approach that uses conditional random fields and self-supervised learning. Given a small corpus sample as input, it automatically makes its own training data and a feature model. Based on the feature model, it automatically extracts all of the attributes and the transition between the activities in each sentence retrieved from the Web corpus. This approach treats activity extraction as a sequence labeling problem, and has advantages such as domain-independence, scalability, and does not require any human input. Since it is unnecessary to fix the number of elements in a tuple, this approach can extract all of the basic attributes and the transition between activities by making only a single pass. Additionally, by converting to simpler sentences, the approach can deal with complex sentences retrieved from the Web. In an experiment, this approach achieves high precision (activity: 88.9%, attributes: over 90%, transition: 87.5%).