Abstract
July 1 - 11.00-11.50
K. Fukumizu
- Singularities of statistical models: from the estimation viewpoint
It is known that some important statistical models such as finite mixture
models and multilayer neural networks are not necessarily smooth
manifolds. These models have singularities at the points corresponding to
density functions of a smaller size. In the Gaussian mixture model with
two components, for example, a point of the standard normal distribution
can be represented by high dimensional subsets of the parameter space, and
it is a singularity of the model if the model is considered in a
functional space. This work discusses singularities of a model from the
viewpoint of statistical estimation. If the true density is located at
such a singularity, the behavior of an estimator for a sample from the
density does not follow the standard theory. This problem has been known
as unidentifiability of a parameter, and has been studied very much.
However, little has been clarified on the general asymptotics of the
maximum likelihood estimator (MLE) around a singularity. It has been
known that in some cases the likelihood ratio test statistics (LRTS)
diverges to infinity as the sample size goes to infinity, which shows a
clear difference from the ordinary chi-square asymptotics. This
divergence result implies also that the degree of freedom around a
singularity can be infinite, if it is measured by the dimensionality of
the fluctuation of MLE around the singularity. Main results of this work
are on the asymptotic order of MLE for i.i.d. sample of size n, assuming
that the true density is located at a singularity. I focus on nonlinear
regression models and multilayer neural networks, in particular. First,
as an extension of Hartigan idea [1], a simple but useful sufficient
condition on the divergence of LRTS is derived from a geometric viewpoint,
using the framework of locally conic models, which is proposed by
Dacunha-Castelle and Gassiat [2]. Next, a universal upper bound O_p(log
n) of LRTS is derived under the assumption that the model is nonlinear
regression with binary output or with Gaussian noise, each of regression
function is bounded, and the family of the functions is of finite
Vapnik-Chervonenkis dimension.
Finally, these results are applied to multilayer perceptrons, which is
one of the most successful models of neural networks, showing that LRTS is
of a larger order than O_p(1) if the model has surplus hidden units to
realize the true function. I derive also a log n lower bound in the case
that the model has at least two surplus hidden units for the true
function, which means the asymptotic order of LRTS is exactly of log n in
such cases.
References
[1] Hartigan, J. A. (1985) A failure of likelihood asymptotics for normal
mixtures.
In Proceedings of Berkeley Conference in Honor of Jerzy Neyman and Jack
Kiefer II
[an error occurred while processing this directive]
|