著者
Hino Hideitsu Murata Noboru
出版者
Elsevier Ltd.
雑誌
Neural networks (ISSN:08936080)
巻号頁・発行日
vol.46, pp.260-275, 2013-10
被引用文献数
11 2

The Shannon information content is a valuable numerical characteristic of probability distributions. The problem of estimating the information content from an observed dataset is very important in the fields of statistics, information theory, and machine learning. The contribution of the present paper is in proposing information estimators, and showing some of their applications. When the given data are associated with weights, each datum contributes differently to the empirical average of statistics. The proposed estimators can deal with this kind of weighted data. Similar to other conventional methods, the proposed information estimator contains a parameter to be tuned, and is computationally expensive. To overcome these problems, the proposed estimator is further modified so that it is more computationally efficient and has no tuning parameter. The proposed methods are also extended so as to estimate the cross-entropy, entropy, and Kullback–Leibler divergence. Simple numerical experiments show that the information estimators work properly. Then, the estimators are applied to two specific problems, distribution-preserving data compression, and weight optimization for ensemble regression.
著者
Hino Hideitsu Wakayama Keigo Murata Noboru
出版者
Elsevier B.V.
雑誌
Computational statistics & data analysis (ISSN:01679473)
巻号頁・発行日
vol.67, pp.105-114, 2013-11
被引用文献数
5 2

The importance of dimension reduction has been increasing according to the growth of the size of available data in many fields. An appropriate dimension reduction method of raw data helps to reduce computational time and to expose the intrinsic structure of complex data. Sliced inverse regression is a well-known dimension reduction method for regression, which assumes an elliptical distribution for the explanatory variable, and ingeniously reduces the problem of dimension reduction to a simple eigenvalue problem. Sliced inverse regression is based on the strong assumptions on the data distribution and the form of regression function, and there are a number of methods to relax or remove these assumptions to extend the applicability of the inverse regression method. However, each method is known to have its drawbacks either theoretically or empirically. To alleviate drawbacks in the existing methods, a dimension reduction method for regression based on the notion of conditional entropy minimization is proposed. Using entropy as a measure of dispersion of data, a low dimensional subspace is estimated without assuming any specific distribution nor any regression function. The proposed method is shown to perform comparable or superior to the conventional methods through experiments using artificial and real-world datasets.