著者
Yata Kazuyoshi Aoshima Makoto
出版者
Elsevier
雑誌
Journal of multivariate analysis (ISSN:0047259X)
巻号頁・発行日
vol.122, pp.334-354, 2013-11
被引用文献数
20

In this paper, we propose a general spiked model called the power spiked model in high-dimensional settings. We derive relations among the data dimension, the sample size and the high-dimensional noise structure. We first consider asymptotic properties of the conventional estimator of eigenvalues. We show that the estimator is affected by the high-dimensional noise structure directly, so that it becomes inconsistent. In order to overcome such difficulties in a high-dimensional situation, we develop new principal component analysis (PCA) methods called the noise-reduction methodology and the cross-data-matrix methodology under the power spiked model. We show that the new PCA methods can enjoy consistency properties not only for eigenvalues but also for PC directions and PC scores in high-dimensional settings.
著者
Yata Kazuyoshi Aoshima Makoto
出版者
Elsevier
雑誌
Journal of multivariate analysis (ISSN:0047259X)
巻号頁・発行日
vol.117, pp.313-331, 2013-05
被引用文献数
17

In this paper, we consider tests of correlation when the sample size is much lower than the dimension. We propose a new estimation methodology called the extended cross-data-matrix methodology. By applying the method, we give a new test statistic for high-dimensional correlations. We show that the test statistic is asymptotically normal when p→∞p→∞ and n→∞n→∞. We propose a test procedure along with sample size determination to ensure both prespecified size and power for testing high-dimensional correlations. We further develop a multiple testing procedure to control both family wise error rate and power. Finally, we demonstrate how the test procedures perform in actual data analyses by using two microarray data sets.
著者
Yata Kazuyoshi Aoshima Makoto
出版者
Springer US
雑誌
Methodology and computing in applied probability (ISSN:13875841)
巻号頁・発行日
vol.14, no.3, pp.459-476, 2012-09
被引用文献数
3 5

We focus on inference about high-dimensional mean vectors when the sample size is much fewer than the dimension. Such data situation occurs in many areas of modern science such as genetic microarrays, medical imaging, text recognition, finance, chemometrics, and so on. First, we give a given-radius confidence region for mean vectors. This inference can be utilized as a variable selection of high-dimensional data. Next, we give a given-width confidence interval for squared norm of mean vectors. This inference can be utilized in a classification procedure of high-dimensional data. In order to assure a prespecified coverage probability, we propose a two-stage estimation methodology and determine the required sample size for each inference. Finally, we demonstrate how the new methodologies perform by using a microarray data set.

1 0 0 0 OA Authors’ Response

著者
Aoshima Makoto Yata Kazuyoshi
出版者
Taylor & Francis
雑誌
Sequential analysis (ISSN:07474946)
巻号頁・発行日
vol.30, no.4, pp.432-440, 2011-11
被引用文献数
6

In this article, we respond to the comments made by the 10 discussants on “Two-Stage Procedures for High-Dimensional Data.” We also give some new results along with their brief explanations.
著者
Aoshima Makoto Yata Kazuyoshi
出版者
Taylor & Francis
雑誌
Sequential analysis (ISSN:07474946)
巻号頁・発行日
vol.30, no.4, pp.356-399, 2011-11
被引用文献数
51 13

In this article, we consider a variety of inference problems for high-dimensional data. The purpose of this article is to suggest directions for future research and possible solutions about p n problems by using new types of two-stage estimation methodologies. This is the first attempt to apply sequential analysis to high-dimensional statistical inference ensuring prespecified accuracy. We offer the sample size determination for inference problems by creating new types of multivariate two-stage procedures. To develop theory and methodologies, the most important and basic idea is the asymptotic normality when p → ∞. By developing asymptotic normality when p → ∞, we first give (a) a given-bandwidth confidence region for the square loss. In addition, we give (b) a two-sample test to assure prespecified size and power simultaneously together with (c) an equality-test procedure for two covariance matrices. We also give (d) a two-stage discriminant procedure that controls misclassification rates being no more than a prespecified value. Moreover, we propose (e) a two-stage variable selection procedure that provides screening of variables in the first stage and selects a significant set of associated variables from among a set of candidate variables in the second stage. Following the variable selection procedure, we consider (f) variable selection for high-dimensional regression to compare favorably with the lasso in terms of the assurance of accuracy and the computational cost. Further, we consider variable selection for classification and propose (g) a two-stage discriminant procedure after screening some variables. Finally, we consider (h) pathway analysis for high-dimensional data by constructing a multiple test of correlation coefficients.
著者
Aoshima Makoto Yata Kazuyoshi
出版者
Springer
雑誌
Annals of the Institute of Statistical Mathematics (ISSN:00203157)
巻号頁・発行日
vol.62, no.3, pp.571-600, 2010-06
被引用文献数
11

We consider fixed-size estimation for a linear function of means fromindependent and normally distributed populations having unknown and respectivevariances.We construct a fixed-width confidence interval with required accuracy aboutthe magnitude of the length and the confidence coefficient. We propose a two-stageestimation methodology having the asymptotic second-order consistency with therequired accuracy. The key is the asymptotic second-order analysis about the riskfunction.We give a variety of asymptotic characteristics about the estimation methodology,such as asymptotic sample size and asymptotic Fisher-information. With thehelp of the asymptotic second-order analysis, we also explore a number of generalizationsand extensions of the two-stage methodology to such as bounded risk pointestimation, multiple comparisons among components between the populations, andpower analysis in equivalence tests to plan the appropriate sample size for a study.