著者
村田 順平 岩沼 宏治 大塚 尚貴
出版者
一般社団法人 人工知能学会
雑誌
人工知能学会論文誌 (ISSN:13460714)
巻号頁・発行日
vol.25, no.3, pp.464-474, 2010 (Released:2010-04-06)
参考文献数
19

In this paper, we propose new methods and gave a system, called IFMAP , for extracting interesting patterns from a long sequential data based on frequency and self-information, and experimentally evaluate the proposed methods in the application of handling a newspaper article corpus.Sequential data mining methods based on frequency have intensively beenstudied so far. These methods, however, are not effective nor valuable for some applications where almost all high-frequent patterns should beregarded just as meaningless noisy patterns. An information-gain concept is quite important in order to restrain these noisy patterns, and was already studied for integrating it with a frequency criteria. Yang et.~al. gave a sequential mining system InfoMiner which can find periodic synchronous patterns being interesting and well-balanced from the both view-points of frequency and self-information. In this paper, we refine and extend the InfoMiner technologies in the following points: firstly, our method can handle ordinary, i.e., asynchronous and non-periodic patterns by using a sliding window mechanism, whereas InfoMiner cannot; secondly we give several combination measures for choosing valuable patterns based on frequency and self-information, while InfoMiner has just one measure which, we show in this paper, is not appropriate nor effective for handling newspaper article corpora; thirdly, we proposed a new unified method for pruning the search space of sequential data mining, which can uniformally be applied to any combination measures proposed here. We conduct experiments for evaluating the effectiveness and efficiency of the proposed method with respect to the runtime and the amount of excluding noisy patterns.