Information geometry and its applications

Pescara - 1-5 July 2002

Abstract

July 1 - 11.00-11.50
K. Fukumizu - Singularities of statistical models: from the estimation viewpoint

It is known that some important statistical models such as finite mixture models and multilayer neural networks are not necessarily smooth manifolds. These models have singularities at the points corresponding to density functions of a smaller size. In the Gaussian mixture model with two components, for example, a point of the standard normal distribution can be represented by high dimensional subsets of the parameter space, and it is a singularity of the model if the model is considered in a functional space. This work discusses singularities of a model from the viewpoint of statistical estimation. If the true density is located at such a singularity, the behavior of an estimator for a sample from the density does not follow the standard theory. This problem has been known as unidentifiability of a parameter, and has been studied very much. However, little has been clarified on the general asymptotics of the maximum likelihood estimator (MLE) around a singularity. It has been known that in some cases the likelihood ratio test statistics (LRTS) diverges to infinity as the sample size goes to infinity, which shows a clear difference from the ordinary chi-square asymptotics. This divergence result implies also that the degree of freedom around a singularity can be infinite, if it is measured by the dimensionality of the fluctuation of MLE around the singularity. Main results of this work are on the asymptotic order of MLE for i.i.d. sample of size n, assuming that the true density is located at a singularity. I focus on nonlinear regression models and multilayer neural networks, in particular. First, as an extension of Hartigan idea [1], a simple but useful sufficient condition on the divergence of LRTS is derived from a geometric viewpoint, using the framework of locally conic models, which is proposed by Dacunha-Castelle and Gassiat [2]. Next, a universal upper bound O_p(log n) of LRTS is derived under the assumption that the model is nonlinear regression with binary output or with Gaussian noise, each of regression function is bounded, and the family of the functions is of finite Vapnik-Chervonenkis dimension.

Finally, these results are applied to multilayer perceptrons, which is one of the most successful models of neural networks, showing that LRTS is of a larger order than O_p(1) if the model has surplus hidden units to realize the true function. I derive also a log n lower bound in the case that the model has at least two surplus hidden units for the true function, which means the asymptotic order of LRTS is exactly of log n in such cases.

References

[1] Hartigan, J. A. (1985) A failure of likelihood asymptotics for normal mixtures.
In Proceedings of Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer II [an error occurred while processing this directive]