Information geometry and its applications

Pescara - 1-5 July 2002

[The Conference] [International Scientific Committee] [Local Organising Committee] [Administrative Team] [Registration and Accommodation] [Abstract] [Visitor Informations] [Local maps and Lecture venue] [Conference Programme] [List of Participants] [View of Pescara (PDF)] [Photo Gallery]

Abstracts

July 1 - 9.30-10.30
S.Amari - Information Geometry of Singular Statistical Models

Information geometry studies manifolds of probability distributions or statistical models. The intrinsic structure of regular finite-dimensional statistical models has been investigated well, and lots of applications emerge in a wide range of fields such as information theory, control systems theory, optimization, neural netwroks, and belief propagation. Many of these models are hierarchical, in the sense that smaller models are included in larger models as submanifolds. Typical such examples are multilayer perceptrons, ARMA time series models and Gaussian mixtures. In such a model, there exist critical areas corresponding to smaller models, on which the parameters become unidentifiable and the Fisher metric degenerates. Geometrically, such models include algebraic singularities. The present talk analyzes the structure of singularities by using a simple model of Gaussian mixtures. The cusp type singularities are found in this case, and accuracy of parameter estimation is analyzed when the true distribution is close to singularity. We also use a simple toy model (cone model) to show some other properties of singularities and the effects of singularities on learning.

July 3 - 11.40-12.30
N.Ay - On the Geometry of Complexity: an Approach to Neural Information Processing

I am following the general concept that complexity should somehow quantify the deviation of a composed system from being the unrelated collection of its individual constituents. Information geometry provides a powerful framework for a mathematical elaboration of this concept. The aim of my talk is to present analytical results on complex systems and illustrate them by computer simulations. Applying my approach to the field of neural networks, it leads to a generalized version of the infomax principle by Linsker.

July 4 - 11.40-12.30
D.C. Brody, G.W. Gibbons - Cone Geometry in Statistical Mechanics

The idea of a convex cone is a very simple one, but it has a surprisingly large number of applications in both mathematics and physics. This talk will cover an elementary introduction to the geometry of convex homogeneous cones. A natural connection to the information geometry of statistical mechanics arises in this context, which will also be discussed.

July 1 - 15.50-16.20
A. Cena - Christoffel symbols of the alpha-connections on the alpha-bundles over the Exponential Statistical Manifolds.

Gibilisco and Pistone (1998) show that the pretangent and the tangent bundle over the Exponential Statistical Manifold are the natural domains to define, respectively, the mixture and the exponential connection in the non-parametric case. Then they define the infinite-dimensional version of the alpha-connections on a suitable family of vector bundles.
In this context we evaluate the Christoffel symbols of the exponential and mixture connection. After studying the regularity of the sphere in the Lebesgue spaces and the natural connection on its tangent bundle, we are able to give the Christoffel symbols of the alpha-connections.

July 2 - 11.40-12.30
J.M. Corcuera - F. Giummole' - Simultaneous prediction

In the present work the problem of prediction is considered in a multidimensional setting. Extending an idea presented in Barndorff-Nielsen and Cox (1996), a predictive density for the future multivariate random variable is proposed. This density has the form of an estimative density plus a correction term which is easily calculated. It gives simultaneous prediction regions with coverage error of smaller asymptotic order than the estimative density. Several examples with a simulation study are presented, showing how the proposed solution improves on the estimative one.

July 2 - 17.10-17.40
A. De Sanctis - Exact asymptotics on Zoll surfaces

One of the procedures used to derive asymptotic expansions of the characteristic function is the method of stationary phase (Barndorff-Nielsen and Cox, '89). Using the Morse Theory, this method requires that we locate the critical points of the original function and then we approximate the characteristic function by certain sums depending on the values of the function and its higher derivatives at critical points.

The problem of finding random variables for which the method of stationary phase produces the exact value of the characteristic function involves topological properties of the statistical manifold. For this the spheres of even dimension are privileged with respect to those of odd dimension (Donald St. P. Richards,'95).

We prove that the classical exactness result on two-dimensional sphere holds also for particular perturbations of the metric, and then of the random variable, the so-called "Zoll metrics".

July 1 - 11.50-12.40
S.Eguchi - Information Geometry of Bregman Divergences

The class of Bregman divergences and the application to statistical methods including PCA, ICA, Gaussian mixture and so forth have been proposed. It is shown that this class offers a special structure on the information geometry, which is in contrast with that associated with the alpha divergences. In the dual connections one is always the mixture connection in the class, which enables us to getting easily the empirical form of the divergence. Thus the objective function to be optimised becomes a linear functional of the empirical distribution. The structure determines the statistical performance of the proposed methods. We also apply this discussion to classification problems. By using the dual form for the optimisation problem to the empirical Bregman distance over a linear combination of weak learners we propose the class of U-boost including AdaBoost, and investigate the performance structure from the statistical point of view.

July 1 - 11.00-11.50
K. Fukumizu - Singularities of statistical models: from the estimation viewpoint

It is known that some important statistical models such as finite mixture models and multilayer neural networks are not necessarily smooth manifolds. These models have singularities at the points corresponding to density functions of a smaller size. In the Gaussian mixture model with two components, for example, a point of the standard normal distribution can be represented by high dimensional subsets of the parameter space, and it is a singularity of the model if the model is considered in a functional space. This work discusses singularities of a model from the viewpoint of statistical estimation. If the true density is located at such a singularity, the behavior of an estimator for a sample from the density does not follow the standard theory. This problem has been known as unidentifiability of a parameter, and has been studied very much. However, little has been clarified on the general asymptotics of the maximum likelihood estimator (MLE) around a singularity. It has been known that in some cases the likelihood ratio test statistics (LRTS) diverges to infinity as the sample size goes to infinity, which shows a clear difference from the ordinary chi-square asymptotics. This divergence result implies also that the degree of freedom around a singularity can be infinite, if it is measured by the dimensionality of the fluctuation of MLE around the singularity. Main results of this work are on the asymptotic order of MLE for i.i.d. sample of size n, assuming that the true density is located at a singularity. I focus on nonlinear regression models and multilayer neural networks, in particular. First, as an extension of Hartigan idea [1], a simple but useful sufficient condition on the divergence of LRTS is derived from a geometric viewpoint, using the framework of locally conic models, which is proposed by Dacunha-Castelle and Gassiat [2]. Next, a universal upper bound O_p(log n) of LRTS is derived under the assumption that the model is nonlinear regression with binary output or with Gaussian noise, each of regression function is bounded, and the family of the functions is of finite Vapnik-Chervonenkis dimension.

Finally, these results are applied to multilayer perceptrons, which is one of the most successful models of neural networks, showing that LRTS is of a larger order than O_p(1) if the model has surplus hidden units to realize the true function. I derive also a log n lower bound in the case that the model has at least two surplus hidden units for the true function, which means the asymptotic order of LRTS is exactly of log n in such cases.

References

[1] Hartigan, J. A. (1985) A failure of likelihood asymptotics for normal mixtures.
In Proceedings of Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer II

July 1 - 17.40-18.10
D. P.K. Ghikas - Killing Symmetries in Information Geometry

We address the question concerning the possible interpretation and usefulness of the existence of Killing Symmetries of Information Manifolds, both classical and quantum. These symmetries are isometries under the action of Lie Transport on the Fisher Information metric. In the classical case we conjecture that they are related to the Transformation models of Barndorff-Nielsen while in the quantum case we expect them to be related to isoentropic transformations. As first results towards a general proof we show that for the normal family the Killing symmetry is generated by sl(2,R) , which is in fact the symmetry of the hyperbolic geometry of this family, while for two models of quantum information geometry, the SO(3) and SL(2,R) of Nencka and Streater these isometries give the isoentropic directions. Finally we discuss some possible applications of these results.

References :

M.K. Murray, J.W. Rice :"Differential Geometry and Statistics"
H. Nencka, R.F. Streater : "Information Geometry for some Lie Algebras", Infinite-Dimensional Analysis, Quantum Probability and Related Topics, 2, pp 441-460. World Scientific.
B. Schutz : " Geometrical Methods of Mathematical Physics"

July 5 - 11.40-12.30
P. Gibilisco, T. Isola - Some open problems in noncommutative Information Geometry

We discuss some of the open problems of the theory of noncommutative alpha-connections and of non-commutative monotone metrics. As an example we show how it is possible to calculate the geodesic distance associated to Wigner-Yanase information from an approach that mimic the classical pull-back approach to Fisher information (note that there exists only another explicit formula of this kind, namely the formula of geodesic distance for the Bures metric).

References
[1] P. Gibilisco, T.Isola, Monotone metrics on statistical manifolds of density matrices by geometry of noncommutative L^2-spaces, in Disordered and Complex Systems, eds. A.C.Coolen, L.Hughston, P.Sollich, R.F.Streater (AIP, 2001), 129-140.

[2] P.Gibilisco, T.Isola. A Characterisation of Wigner-Yanase skew information among statistically monotone metrics. Vol. 4, No. 4 (2001), 553-557. Infinite Dimensional Analysis, Quantum Probability & Related Fields.

July 4 - 10.50-11.40
M.Grasselli, R.F. Streater - Monotonicity, Duality and Uniqueness of the WYD Metrics

In a previous work, we have found that the Bogoliubov-Kubo-Mori metric is the only monotone metric on finite dimensional quantum systems for which theexponential and mixture connections are mutually dual. It is well established that both the $\pm$-connections and the BKM metric are limiting cases of the more general class of $\alpha$-connections and Wigner-Yanase-Dyson metrics. The present paper extends the uniqueness result mentioned above for this more general class. Namely, for each value of $\alpha \in (-1,1)$, we prove that the only monotone metrics for which the $\pm\alpha$- connections are mutually dual are scalar multiples of the Wigner-Yanase-Dyson metric.

July 5 - 10.50-11.40
H.Hasegawa - On the Dual Geometry of Wigner-Yanase-Dyson Information Quantities

Wigner-Yanase-Dyson conjecture appeared about forty years ago as a subject of mathematical physics concerning the convexity of a matrix-valued information quantity. Lieb gave an affirmative answer to the conjecture in 1973 in the more general context of operator algebras. Another proof of the so-called Wigner-Yanase-Dyson-Lieb concavity was given by Uhlmann in 1977. What interests us about this well-established subject is its information-geometrical significance: it provides us with a typical example of quantum Fisher information, and furthermore this example carries Amari's concept of duality. In the present talk I wish to show that this concept: (a) enables us to sharpen Petz's classification theorem of monotone metrics; (b) characterizes the associated quasi-entropy; (c) introduces naturally (in the framework of matrix analysis) a connection that conforms to Amari's dual connection.

July 3 - 10.50-11.40
S.Ikeda, T. Tanaka, S. Amari - Information Geometry of Turbo Codes and Low-density Parity-Check Codes

Since the proposal of turbo codes in 1993, many studies have appeared on this simple and new error correcting codes which give a powerful and practical method for error correction. The essential point of turbo codes is their iterative decoding algorithm, however, the main properties of the decoding algorithm which have been so far obtained are mostly empirical. The essence of the turbo decoding has not been fully understood theoretically. Except for the experimental studies, a clue has been sought in other iterative methods, which are closely related to turbo codes. One of the methods is another class of error correcting codes called low-density parity-check (LDPC) codes, which was originally proposed by Gallager in 1960's. Related ideas are found even in different fields, one is in artificial intelligence and another in statistical physics. McEliece et al. showed that the turbo decoding algorithm is equivalent to the belief propagation, applied to a belief diagram with loops, MacKay showed that LDPC codes are also equivalent to the belief propagation, while Kabashima and Saad showed that the iterative process of Bethe approximation in statistical physics is the same as that of the belief propagation. However, the efficiencies of these methods are also a sort of mystery, and they didn't help us clarify the mathematical structure of turbo codes. In this presentation, we focus on the turbo and the LDPC decoding and investigate the mathematical structure of the iterative decoding methods from the information geometrical viewpoint. We first formulate the problem of the error correcting codes as an m--projection of a given distribution to an e--flat submanifold which consists of factorizable distributions. Since the exact m--projection is usually computationally intractable, it is approximated through iterative algorithms. We also express the turbo and the LDPC decoding algorithm as the combination of an m--projection and an e--projection.

July 4 - 16.40-17.30
A. Jencova - Information geometry in the standard representation of matrix spaces

The algebra of operators acting on a Hilbert space is standardly represented on the space W of Hilbert-Schmidt operators. The aim of the present contribution is to show how (in finite dimensions) the basic structures of quantum information geometry are lifted to W. It was shown by Dittmann and Uhlmann that the monotone Riemannian metrics are related to certain real vector subspaces in W. We show that there is a natural duality of such subspaces, which suggests a duality of the corresponding metrics. We also introduce dual parallel transports, related to the exponential and mixture connections. As examples, we treat the smallest (Bures) and the largest monotone metric and the smallest WYD metric. In these cases, we also show that the corresponding one-dimensional exponential families are related to positive cones in W.

July 2 - 14.30-15.20
P. Jupp - Yoke Geometry in Parametric Inference

A basic structure in the differential-geometric approach to higher-order statistical asymptotics is that of a yoke. The role of yoke geometry will be illustrated by three topics:

(i) cubic modifications of score tests,
(ii) parameterisation-invariant versions of Wald tests,
(iii) modifications of likelihood functions.

July 2 - 10.50-11.40
F. Komaki - Information Geometry of Statistical Prediction

Bayesian predictive distributions are investigated from the viewpoint of information geometry. Kullback-Leibler divergence from the true distribution to a predictive distribution is adopted as a loss function. We show that there are many examples where the Bayesian predictive distribution based on the Jeffreys prior is dominated by Bayesian predictive distributions based on other priors. It is shown that the Bayesian predictive distribution based on the right invariant measure is the best invariant predictive distribution when a model has a group structure. Furthermore, we show that there exist shrinkage predictive distributions asymptotically dominating Bayesian predictive distributions based on the Jeffreys prior or other vague priors if the model manifold satisfies some differential geometric conditions. We show several examples where shrinkage predictive distributions exactly dominate Bayesian predictive distributions based on vague priors.

July 3 - 9.30-10.20
F.Matus, I. Csiszar - Information Projections and MLE in Exponential Families Revisited

The goal of this contribution is to complete results available about I-projections, reverse I-projections, and their generalized versions, with focus on linear and exponential families. Pythagorean-like identities and inequalities are revisited and generalized, and generalized maximum likelihood estimates for exponential families are introduced. Regularity conditions, that have been frequently imposed, can be removed. The main tool is a new concept of extension of exponential families, based on our earlier results on convex cores of measures. Given a sample from an unknown distribution in an exponential family, the maximum likelihood estimate (MLE) exists if and only if the sample mean of the directional statistic belongs to the relative interior of the domain of the convex conjugate of the cumulant generating function. We show for each point of that domain that `approximate MLEs' converge to a unique member of an information closure of the exponential family. This follows from a new refinement of Fenchel inequality. The MLE in that closure and in extensions of exponential families will be related to minimization of the information divergence in the second coordinate.

July 4 - 15.20-16.10
H. Nagaoka - Quantum Information Geometry and Statistical Inference on Quantum States

tatistical inference problems such as parameter estimation and hypothesis testing on quantum states bring strong motivations to differential geometrical study of a quantum state space just as in the classical information geometry. I would like to talk about how such geometrical concepts as Riemannian metric, duality of affine connections, autoparallel submanifold (geodesic in particular), etc. are related to several basic problems concerning statistical inference on quantum states. The talk is partly based on a joint work with Akio Fujiwara.

July 1 - 14.30-15.20
A.Ohara - Dualistic Differential Geometry on Symmetric Cones and its Applications

We discuss dually flat structures on symmetric (i.e., homogeneous and self-dual ) cones associated with Euclidean Jordan algebra. First we exploit relations between dual connections on symmetric cones and Euclidean Jordan algebras. In particular, we introduce the property called "doubly autoparallelism" and show how doubly autoparallel submanifolds are characterized by Jordan subalgebras. Next we define means on symmetric cones in an axiomatic way following Kubo-Ando theory and then we discuss them from a viewpoint of dualistic differential structure. We show that various means are expressed by the midpoints on geodesics with respect to the corresponding dualistic structures by elucidating the relation between the geodesics and operator monotone functions that generate means.

July 5 - 14.30-15.30
D.Petz - Covariance and Fisher information in quantum mechanics

Variance and Fisher information are ingredients of the Cramer-Rao inequality. We regard Fisher information as a Riemannian metric on a quantum statistical manifold and choose monotonicity under coarse graining as the fundamental property of variance and Fisher information. In this approach we show that there is a kind of dual one-to-one correspondence between the candidates of the two concepts. We emphasis that Fisher informations are obtained from relative entropies as contrast functions on the state space and argue that the scalar curvature might be interpreted as an uncertainty density on a statistical manifold.

July 1 - 15.20-15.50
G.Pistone - Recent Results on Exponential Statistical Manifolds

In a paper published 1995 with C. Sempi, a definition of the manifold structure of the positive probability densities was introduced. Such manifold in modeled on Orlicz spaces with exponential Young function and is based on the representation of probabilities as non-parametric exponential models. The idea was further developed in a paper with M.-P. Rogantin (1999) with improvement of the basic contruction and a few results on the expectation parameterization on submanifolds. the theory is still lacking of important features and the basic approach, eg the use of Banach space of Orlicz type as local models in the framework of standard manifold theory has been questioned.

On the positive side, a number of new results has been derived recently and old results have been improved: it is expected that some of these improvement will be presented by the author during the meeting.

We will give a short presentation of the basic theory as we know it now, recalling what it is already known and adding the new features, expecially on the regularity of change of coordinates, cumulant function, submanifolds, alternative structures. Other important chapters, eg the theory on the tangent bundle with submanifolds and connections, or the relation with information theory will be presented by other authors.

July 4 - 14.30-15.20
M. B. Ruskai - Monotone Metrics on Density Matrices

The distance between two density matrices in quantum information theory can be measured in many ways, including the trace norm, the relative entropy (which is not a true metric) and the Bures metric. All of these contract under completely positive, trace-preserving maps. We describe a general framework for monotone metrics using convex operator functions. Each function in the class defines a symmetric relative entropy pseudo-distance, a Riemannian metric on the tangent space, and a geodesic distance.

[Contraction of Relative Entropy, Riemannian Metrics and Related Measures of Distance between States on Non-commutative Probability Spaces (PDF)]

[Examples of monotone metrics and related quantities (PDF)]

July 2 - 9.30-10.20
A. Salvan, L. Pace - The geometric structure of likelihood expansions in the presence of nuisance parameters

Stochastic expansions of likelihood quantities are usually derived through ordinary Taylor expansions, rearranging terms according to their asymptotic order.The most convenient form for such expansions involves the score function, the expected information, higher-order log-likelihood derivatives and their expectations. Expansions of this form are called expected/observed. If the quantity expanded is a tensor under a group of transformations on the parameter space, the entire contribution of a given asymptotic order to the expected/observed expansion will follow the same transformation law. When there are no nuisance parameters, explicit representations through appropriate tensors are available. In this contribution, we analyse the geometric structure of expected/observed likelihood expansions when nuisance parameters are present. We outline the derivation of likelihood quantities which behave as tensors under interest-respecting reparameterisations. This allows us to write the usual stochastic expansions of profile likelihood quantities in an explicitly tensorial form.

July 4 - 9.30-10.20
R. F. Streater - Dual structures on a quantum information manifold.

We find conditions on a manifold M of states of W*-algebra B(H) such that both the (+1) and the (-1) affine structures are defined on the tangent space. Sufficient conditions are that M consists of density operators D such that D^p is of trace class for all p>0, and that the topology is such that a neighbourhood of a point D(0) consists of all points D of M such that there exist c, C such that 0 < cD < D(0) < CD holds. An equivalent condition in terms of the Connes cocycle is derived.

July 1 - 16.50-17.40
J.Takeuchi, S. Amari - Alpha-parallel prior and its properties

It is known that the Jeffreys prior plays an important role in statistical inference. In this paper, we generalize the Jeffreys prior from the point of view of information geometry introducing a one-parameter family of prior distributions, which we named alpha-parallel priors. The alpha-parallel prior is defined as the parallel volume element with respect to the alpha-connection and coincides with the Jeffreys prior when alpha=0. Further, we analyze asymptotic behaviors of the various estimators such as the projected Bayes estimator (the estimator obtained by projecting the Bayes predictive density onto the original class of distributions) and the MDL estimator, when the alpha-parallel prior is used. The correction term due to the alpha-prior is shown to be regulated by an invariant vector field of the statistical model. Although the Jeffreys prior always exists, the existence of alpha-parallel prior with non-zero alpha is not always guaranteed. Hence we consider conditions for the existence of the alpha-parallel prior, elucidating the conjugate symmetry in a statistical model.

July 5 - 9.30-10.20
A.Uhlmann - The Bures Distance and its Riemannian Metric

In the classification of monotone metrics by D. Petz the Bures one seems to be the simplest. I follow the way from Bures� distance to its metric form, and try to explain what is physically important. A few open problems shall be presented.

July 2 - 15.20-16.10
P. Vos - Dual geometries in statistics

An overview of the role of dual geometries in statistics is given, beginning with the classical result on the relative information loss of a statistic expressed in terms of two curvatures. This important result is used to illustrate the various contributions dual geometry can make in statistics. Other topics, including maximum likelihood estimation, sufficiency, and generalized linear models, are also discussed.

July 2 - 16.40-17.10
J. Zhang - Information Divergence and Convex Analysis

An observation is made that information divergence in various forms (Amari, 1985; Zhu and Rohwer, 1995; Kass and Vos, 1997) arise naturally from basic inequalities and duality in convex analysis. Some new families of divergence can be introduced that would include the alpha-divergence as a special case.

Back to Science Dept. home page

Information: Webadmin
Last modify 22-Nov-2005