著者
渡辺 澄夫 Sumio Watanabe 東京工業大学精密工学研究所 Precision and Intelligence Laboratory Tokyo Institute of Technology
雑誌
人工知能学会誌 = Journal of Japanese Society for Artificial Intelligence (ISSN:09128085)
巻号頁・発行日
vol.16, no.2, pp.308-315, 2001-03-01

The parameter space of a hierarchical learning machine is not a Riemannian manifold since the rank of the Fisher information metric depends on the parameter.In the previous paper, we proved that the stochastic complexity is asymptotically equal to λ log n-(m-1)log log n, where λ is a rational number, m is a natural number, and n is the number of empirical samples.Also we proved that both λ and m are calculated by resolution of singularties.However, both λ and m depend on the parameter representation and the size of the true distribution.In this paper, we study Jeffreys' prior distribution which is coordinate free, and prove that 2λ is equal to the dimension of the parameter set and m=1 independently of the parameter representation and singularities.This fact indicated that Jeffreys' prior is useful in model selection and knowledge discovery, in spite that it makes the prediction error to be larger than positive distributions.
著者
渡辺 澄夫 Sumio Watanabe
出版者
人工知能学会
雑誌
人工知能学会誌 = Journal of Japanese Society for Artificial Intelligence (ISSN:09128085)
巻号頁・発行日
vol.16, no.2, pp.308-315, 2001-03-01
参考文献数
25
被引用文献数
20

The parameter space of a hierarchical learning machine is not a Riemannian manifold since the rank of the Fisher information metric depends on the parameter.In the previous paper, we proved that the stochastic complexity is asymptotically equal to λ log n-(m-1)log log n, where λ is a rational number, m is a natural number, and n is the number of empirical samples.Also we proved that both λ and m are calculated by resolution of singularties.However, both λ and m depend on the parameter representation and the size of the true distribution.In this paper, we study Jeffreys' prior distribution which is coordinate free, and prove that 2λ is equal to the dimension of the parameter set and m=1 independently of the parameter representation and singularities.This fact indicated that Jeffreys' prior is useful in model selection and knowledge discovery, in spite that it makes the prediction error to be larger than positive distributions.