著者
後藤 匡史 長木 悠太 鈴木 英之進
出版者
一般社団法人 人工知能学会
雑誌
人工知能学会論文誌 = Transactions of the Japanese Society for Artificial Intelligence : AI (ISSN:13460714)
巻号頁・発行日
vol.16, pp.193-201, 2001-11-01
参考文献数
13

This paper presents a novel decision-tree induction for a multi-objective data set, i.e. a data set with a multi-dimensional class. Inductive decision-tree learning is one of the frequently-used methods for a single-objective data set, i.e. a data set with a single-dimensional class. However, in a real data analysis, we usually have multiple objectives, and a classifier which explains them simultaneously would be useful. A conventional decision-tree inducer requires transformation of a multi-dimensional class into a singledimensional class, but such a transformation can considerably worsen both accuracy and readability. In order to circumvent this problem we propose a bloomy decision tree which deals with a multi-dimensional class without such transformations. A bloomy decision tree consists of a set of decision nodes each of which splits examples according to their attribute values, and a set of .ower nodes each of which decidesa dimension of the class for examples. A flower node appears not only at the fringe of a tree but also inside a tree. Our pruning is executed during tree construction, and evaluates each dimension of the class based on Cramér’s V. The proposed method has been implemented as D3-B (Decision tree in Bloom), and tested with eleven benchmark data sets in the machine learning community. The experiments showed that D3-B has higher accuracies in nine data sets than C4.5 and tied with it in the other two data sets. In terms of readability, D3-B has a smaller number of decision nodes in all data sets, and thus outperforms C4.5. Moreover, experts in agriculture evaluated bloomy decision trees, each of which is induced from an agricultural data set, and found them appropriate and interesting.
著者
中本 和岐 山田 悠 鈴木 英之進
出版者
一般社団法人 人工知能学会
雑誌
人工知能学会論文誌 (ISSN:13460714)
巻号頁・発行日
vol.18, no.3, pp.144-152, 2003 (Released:2003-03-04)
参考文献数
12
被引用文献数
1 2

This paper proposes a fast clustering method for time-series data based on average time sequence vector. A clustering procedure based on an exhaustive search method is time-consuming although its result typically exhibits high quality. BIRCH, which reduces the number of examples by data squashing based on a data structure CF (Clustering Feature) tree, represents an effective solution for such a method when the data set consists of numerical attributes only. For time-series data, however, a straightforward application of BIRCH based on a Euclidean distance for a pair of sequences, miserably fails since such a distance typically differs from human's perception. A dissimilarity measure based on DTW (Dynamic Time Warping) is desirable, but to the best of our knowledge no methods have been proposed for time-series data in the context of data squashing. In order to circumvent this problem, we propose DTWS (Dynamic Time Warping Squashed) tree, which employs a dissimilarity measure based on DTW, and compresses time sequences to the average time sequence vector. An average time sequence vector is obtained by a novel procedure which estimates correct shrinkage of a result of DTW. Experiments using the Australian sign language data demonstrate the superiority of the proposed method in terms of correctness of clustering, while its degradation of time efficiency is negligible.
著者
後藤 匡史 長木 悠太 鈴木 英之進
出版者
一般社団法人 人工知能学会
雑誌
人工知能学会論文誌 (ISSN:13460714)
巻号頁・発行日
vol.16, no.2, pp.193-201, 2001 (Released:2002-02-28)
参考文献数
13

This paper presents a novel decision-tree induction for a multi-objective data set, i.e. a data set with a multi-dimensional class. Inductive decision-tree learning is one of the frequently-used methods for a single-objective data set, i.e. a data set with a single-dimensional class. However, in a real data analysis, we usually have multiple objectives, and a classifier which explains them simultaneously would be useful. A conventional decision-tree inducer requires transformation of a multi-dimensional class into a singledimensional class, but such a transformation can considerably worsen both accuracy and readability. In order to circumvent this problem we propose a bloomy decision tree which deals with a multi-dimensional class without such transformations. A bloomy decision tree consists of a set of decision nodes each of which splits examples according to their attribute values, and a set of .ower nodes each of which decidesa dimension of the class for examples. A flower node appears not only at the fringe of a tree but also inside a tree. Our pruning is executed during tree construction, and evaluates each dimension of the class based on Cramér’s V. The proposed method has been implemented as D3-B (Decision tree in Bloom), and tested with eleven benchmark data sets in the machine learning community. The experiments showed that D3-B has higher accuracies in nine data sets than C4.5 and tied with it in the other two data sets. In terms of readability, D3-B has a smaller number of decision nodes in all data sets, and thus outperforms C4.5. Moreover, experts in agriculture evaluated bloomy decision trees, each of which is induced from an agricultural data set, and found them appropriate and interesting.
著者
鈴木 英之進 安藤 晋
出版者
横浜国立大学
雑誌
基盤研究(B)
巻号頁・発行日
2004

多視点・多粒度型知識発見のためのデータマイニング手法として,データの重要部分を確率的クラスタリングにより要約し,情報量規準をもとに色相を割り振る方法を考案した.この方法は,医療検査データで有効性が示されたわれわれのプロトタイプラインの拡張となっている.この方法の有効性をテキスト画像データであるウェブページデータを対象として調べ,Googleに比較して再現率,適合率,および発見時間の全てにおいて優れていることを示した.この手法を改良・発展して最終手法とし,ウェブページデータやネットワーク侵入データなどに適用してその有効性を定量的に評価した.ウェブページデータを用いた実験は,多数のウェブページの内容をA4用紙1枚の表示結果から把握する課題について行った.一定時間に多数の質問を課す形式のため,評価指標としては被験者たちの正解数を採用し,Googleに比較して約35%増加することに成功した.画像やキーワードに関する個別処理は必要であるものの,知識発見のために適切な複数の視点と粒度で情報を可視化するという当初の目的を達成できたと考える.ネットワーク侵入データを用いた実験は,ウェブページへのアクセス履歴からの予測問題について行った.不正アクセス検知に関する再現率・適合率,珍しい不正アクセスの発見,可視化結果の見易さなどに関して良好な結果を得た.研究過程において,多目的型探索手法,情報量評価指標,および述語データ用クラスタリングなども開発してそれらの有効性を確認したその他,仏国カン大学と協力してアイテム集合トランザクションデータ可視化手法を開発し,良好な結果を得た.サッカーに代表される各種時空間データへの適用も進め,可視化と知識発見の両面で成果をあげた.
著者
山田 悠 鈴木 英之進 横井 英人 高林 克日己
出版者
一般社団法人情報処理学会
雑誌
情報処理学会研究報告知能と複雑系(ICS) (ISSN:09196072)
巻号頁・発行日
vol.2003, no.30, pp.141-146, 2003-03-13

本論では,時系列属性を含むデータから決定木を学習する新しい方式を提案する.時系列属性は,値と時刻のペアについてのシーケンスとして表される時系列データを値にとる属性であり,種々の実応用問題に頻出するために重要であると考えられる.われわれが提案する時系列決定木は,内部ノードに時系列データを持ち,時系列データに関する距離に基づいて例集合を分割する.最初に動的時間伸縮法に基づく基準例分割テストを定義し,次にこれを用いた決定木学習法を示す.実験の結果,提案手法は他の手法に比較して理解しやすく正確な決定木を学習でき,ることが分かった.さらに医療問題への適用の結果,時系列決定僕は知識発見に有望であることが分かった.This paper proposes a novel approach for learning a decision tree from a data set with time-series attributes. A time^series attribute takes, as its value, a sequence of values each of which is associated with a time atamp, and can be considered as important since it fruquantly in real-world applications. Our time-series tree has a time sequence in its internal node, and splits examples based on similarities between a pair of time sequences. We first define our standard example split test based on dynamic time warping, then propose a decision tree induction procedure for the split test. Experimental results confirm that our induction method, unlike other methods, constructs comprehensive and accurate trees. Moreover, a medical application shows that our time-series tree is promising in knowledge discovery.