後藤 匡史 長木 悠太 鈴木 英之進
一般社団法人 人工知能学会
人工知能学会論文誌 = Transactions of the Japanese Society for Artificial Intelligence : AI (ISSN:13460714)
vol.16, pp.193-201, 2001-11-01

This paper presents a novel decision-tree induction for a multi-objective data set, i.e. a data set with a multi-dimensional class. Inductive decision-tree learning is one of the frequently-used methods for a single-objective data set, i.e. a data set with a single-dimensional class. However, in a real data analysis, we usually have multiple objectives, and a classifier which explains them simultaneously would be useful. A conventional decision-tree inducer requires transformation of a multi-dimensional class into a singledimensional class, but such a transformation can considerably worsen both accuracy and readability. In order to circumvent this problem we propose a bloomy decision tree which deals with a multi-dimensional class without such transformations. A bloomy decision tree consists of a set of decision nodes each of which splits examples according to their attribute values, and a set of .ower nodes each of which decidesa dimension of the class for examples. A flower node appears not only at the fringe of a tree but also inside a tree. Our pruning is executed during tree construction, and evaluates each dimension of the class based on Cramér’s V. The proposed method has been implemented as D3-B (Decision tree in Bloom), and tested with eleven benchmark data sets in the machine learning community. The experiments showed that D3-B has higher accuracies in nine data sets than C4.5 and tied with it in the other two data sets. In terms of readability, D3-B has a smaller number of decision nodes in all data sets, and thus outperforms C4.5. Moreover, experts in agriculture evaluated bloomy decision trees, each of which is induced from an agricultural data set, and found them appropriate and interesting.
中本 和岐 山田 悠 鈴木 英之進
一般社団法人 人工知能学会
人工知能学会論文誌 (ISSN:13460714)
vol.18, no.3, pp.144-152, 2003 (Released:2003-03-04)
1 2

This paper proposes a fast clustering method for time-series data based on average time sequence vector. A clustering procedure based on an exhaustive search method is time-consuming although its result typically exhibits high quality. BIRCH, which reduces the number of examples by data squashing based on a data structure CF (Clustering Feature) tree, represents an effective solution for such a method when the data set consists of numerical attributes only. For time-series data, however, a straightforward application of BIRCH based on a Euclidean distance for a pair of sequences, miserably fails since such a distance typically differs from human's perception. A dissimilarity measure based on DTW (Dynamic Time Warping) is desirable, but to the best of our knowledge no methods have been proposed for time-series data in the context of data squashing. In order to circumvent this problem, we propose DTWS (Dynamic Time Warping Squashed) tree, which employs a dissimilarity measure based on DTW, and compresses time sequences to the average time sequence vector. An average time sequence vector is obtained by a novel procedure which estimates correct shrinkage of a result of DTW. Experiments using the Australian sign language data demonstrate the superiority of the proposed method in terms of correctness of clustering, while its degradation of time efficiency is negligible.
後藤 匡史 長木 悠太 鈴木 英之進
一般社団法人 人工知能学会
人工知能学会論文誌 (ISSN:13460714)
vol.16, no.2, pp.193-201, 2001 (Released:2002-02-28)

This paper presents a novel decision-tree induction for a multi-objective data set, i.e. a data set with a multi-dimensional class. Inductive decision-tree learning is one of the frequently-used methods for a single-objective data set, i.e. a data set with a single-dimensional class. However, in a real data analysis, we usually have multiple objectives, and a classifier which explains them simultaneously would be useful. A conventional decision-tree inducer requires transformation of a multi-dimensional class into a singledimensional class, but such a transformation can considerably worsen both accuracy and readability. In order to circumvent this problem we propose a bloomy decision tree which deals with a multi-dimensional class without such transformations. A bloomy decision tree consists of a set of decision nodes each of which splits examples according to their attribute values, and a set of .ower nodes each of which decidesa dimension of the class for examples. A flower node appears not only at the fringe of a tree but also inside a tree. Our pruning is executed during tree construction, and evaluates each dimension of the class based on Cramér’s V. The proposed method has been implemented as D3-B (Decision tree in Bloom), and tested with eleven benchmark data sets in the machine learning community. The experiments showed that D3-B has higher accuracies in nine data sets than C4.5 and tied with it in the other two data sets. In terms of readability, D3-B has a smaller number of decision nodes in all data sets, and thus outperforms C4.5. Moreover, experts in agriculture evaluated bloomy decision trees, each of which is induced from an agricultural data set, and found them appropriate and interesting.
鈴木 英之進 安藤 晋

山田 悠 鈴木 英之進 横井 英人 高林 克日己
情報処理学会研究報告知能と複雑系(ICS) (ISSN:09196072)
vol.2003, no.30, pp.141-146, 2003-03-13

本論では,時系列属性を含むデータから決定木を学習する新しい方式を提案する.時系列属性は,値と時刻のペアについてのシーケンスとして表される時系列データを値にとる属性であり,種々の実応用問題に頻出するために重要であると考えられる.われわれが提案する時系列決定木は,内部ノードに時系列データを持ち,時系列データに関する距離に基づいて例集合を分割する.最初に動的時間伸縮法に基づく基準例分割テストを定義し,次にこれを用いた決定木学習法を示す.実験の結果,提案手法は他の手法に比較して理解しやすく正確な決定木を学習でき,ることが分かった.さらに医療問題への適用の結果,時系列決定僕は知識発見に有望であることが分かった.This paper proposes a novel approach for learning a decision tree from a data set with time-series attributes. A time^series attribute takes, as its value, a sequence of values each of which is associated with a time atamp, and can be considered as important since it fruquantly in real-world applications. Our time-series tree has a time sequence in its internal node, and splits examples based on similarities between a pair of time sequences. We first define our standard example split test based on dynamic time warping, then propose a decision tree induction procedure for the split test. Experimental results confirm that our induction method, unlike other methods, constructs comprehensive and accurate trees. Moreover, a medical application shows that our time-series tree is promising in knowledge discovery.