Kybernetika 48 no. 4, 637-689, 2012

Generalized minimizers of convex integral functionals, Bregman distance, Pythagorean identities

Imre Csiszár and František Matúš


Integral functionals based on convex normal integrands are minimized subject to finitely many moment constraints. The integrands are finite on the positive and infinite on the negative numbers, strictly convex but not necessarily differentiable. The minimization is viewed as a primal problem and studied together with a dual one in the framework of convex duality. The effective domain of the value function is described by a conic core, a modification of the earlier concept of convex core. Minimizers and generalized minimizers are explicitly constructed from solutions of modified dual problems, not assuming the primal constraint qualification. A generalized Pythagorean identity is presented using Bregman distance and a correction term for lack of essential smoothness in integrands. Results are applied to minimization of Bregman distances. Existence of a generalized dual solution is established whenever the dual value is finite, assuming the dual constraint qualification. Examples of `irregular' situations are included, pointing to the limitations of generality of certain key results.


maximum entropy, moment constraint, generalized primal/dual solutions, normal integrand, minimizing sequence, convex duality, Bregman projection, conic core, generalized exponential family, inference principles


94A17, 49J53, 49K30, 62B10, 65K10, 90C46


  1. S. M. Ali and S. D. Silvey: A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. Ser. B 28 (1966) 131-142.   CrossRef
  2. S. Amari and H. Nagaoka: Methods of Information Geometry. Transl. Math. Monographs 191, Oxford Univ. Press, 2000.   CrossRef
  3. S. Amari and A. Cichocki: Information geometry of divergence functions. Bull. Polish Acad. Sci. 58 (2010) 183-194.   CrossRef
  4. O. Barndorff-Nielsen: Information and Exponential Families in Statistical Theory. Wiley, 1978.   CrossRef
  5. H. H. Bauschke and J. M. Borwein: Legendre functions and the method of random Bregman projections. J. Convex Anal. 4 (1997), 27-67.   CrossRef
  6. H. H. Bauschke, J. M. Borwein and P. L. Combettes: Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces. Comm. Contemp. Math. 3 (2001), 615-647.   CrossRef
  7. A. Ben-Tal and A. Charnes: A dual optimization framework for some problems of information theory and statistics. Problems Control Inform. Theory 8 (1979), 387-401.   CrossRef
  8. J. M. Borwein and A. S. Lewis: Duality relationships for entropy-like minimization problems. SIAM J. Control Optim. 29 (1991), 325-338.   CrossRef
  9. J. M. Borwein and A. S. Lewis: Convergence of best entropy estimates. SIAM J. Optim. 1 (1991), 191-205.   CrossRef
  10. J. M. Borwein and A. S. Lewis: Partially-finite programming in $L_1$ and the existence of maximum entropy estimates. SIAM J. Optim. 3 (1993), 248-267.   CrossRef
  11. J. M. Borwein, A. S. Lewis and D. Noll: Maximum entropy spectral analysis using derivative information. Part I: Fisher information and convex duality. Math. Oper. Res. 21 (1996), 442-468.   CrossRef
  12. L. M. Bregman: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. and Math. Phys. 7 (1967), 200-217.   CrossRef
  13. M. Broniatowski and A. Keziou: Minimization of $\phi$-divergences on sets of signed measures. Studia Sci. Math. Hungar. 43 (2006), 403-442.   CrossRef
  14. J. P. Burg: Maximum entropy spectral analysis. Paper presented at 37th Meeting of Soc. Explor. Geophysicists, Oklahoma City 1967.   CrossRef
  15. J. P. Burg: Maximum entropy spectral analysis. Ph.D. Thesis, Dept. Geophysics, Stanford Univ., Stanford 1975.   CrossRef
  16. Y. Censor and S. A. Zenios: Parallel Optimization. Oxford University Press, New York 1997.   CrossRef
  17. N. N. Chentsov: Statistical Decision Rules and Optimal Inference. Transl. Math. Monographs 53, American Math. Soc., Providence 1982. Russian original: Nauka, Moscow 1972.   CrossRef
  18. I. Csiszár: Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten. Publ. Math. Inst. Hungar. Acad. Sci. 8 (1963), 85-108.   CrossRef
  19. I. Csiszár: Information-type measures of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar. 2 (1967), 299-318.   CrossRef
  20. I. Csiszár: $I$-divergence geometry of probability distributions and minimization problems. Ann. Probab. 3 (1975), 146-158.   CrossRef
  21. I. Csiszár: Sanov property, generalized $I$-projection and a conditional limit theorem. Ann. Probab. 12 (1984), 768-793.   CrossRef
  22. I. Csiszár: Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann. Statist. 19 (1991), 2031-2066.   CrossRef
  23. I. Csiszár: Generalized projections for non-negative functions. Acta Math. Hungar. 68 (1995), 1-2, 161-185.   CrossRef
  24. I. Csiszár, F. Gamboa and E. Gassiat: MEM pixel correlated solutions for generalized moment and interpolation problems. IEEE Trans. Inform. Theory 45 (1999), 2253-2270.   CrossRef
  25. I. Csiszár and F. Matúš: Convex cores of measures on $\R^d$. Studia Sci. Math. Hungar. 38 (2001), 177-190.   CrossRef
  26. I. Csiszár and F. Matúš: Information projections revisited. IEEE Trans. Inform. Theory 49 (2003), 1474-1490.   CrossRef
  27. I. Csiszár and F. Matúš: Generalized maximum likelihood estimates for infinite dimensional exponential families. In: Proc. Prague Stochastics'06, Prague 2006, pp. 288-297.   CrossRef
  28. I. Csiszár and F. Matúš: Generalized maximum likelihood estimates for exponential families. Probab. Theory Related Fields 141 (2008), 213-246.   CrossRef
  29. I. Csiszár and F. Matúš: On minimization of entropy functionals under moment constraints. In: Proc. ISIT 2008, Toronto, pp. 2101-2105.   CrossRef
  30. I. Csiszár and F. Matúš: On minimization of multivariate entropy functionals. In: Proc. ITW 2009, Volos, Greece, pp. 96-100.   CrossRef
  31. I. Csiszár and F. Matúš: Minimization of entropy functionals revisited. In: Proc. ISIT 2012, Cambridge, MA, pp. 150-154.   CrossRef
  32. D. Dacunha-Castelle and F. Gamboa: Maximum d'entropie et problème des moments. Ann. Inst. H. Poincaré Probab. Statist. 26 (1990), 567-596.   CrossRef
  33. A. P. Dawid and P. D. Grünwald: Game theory, maximum entropy, minimum discrepancy, and robust Bayesian decision theory. Ann. Statist. 32 (2004), 1367-1433.   CrossRef
  34. S. Eguchi: Information geometry and statistical pattern recognition. Sugaku Expositions, Amer. Math. Soc. 19 (2006), 197-216.   CrossRef
  35. B. A. Frigyik, S. Srivastava and M. R. Gupta: Functional Bregman divergence and Bayesian estimation of distributions. IEEE Trans. Inform. Theory 54 (2008), 5130-5139.   CrossRef
  36. F. Gamboa and E. Gassiat: Bayesian methods and maximum entropy for ill-posed inverse problems. Ann. Statist. 25 (1997), 1, 328-350.   CrossRef
  37. E. T. Jaynes: Information theory and statistical mechanics. Physical Review Ser. II 106 (1957), 620-630.   CrossRef
  38. L. Jones and C. Byrne: General entropy criteria for inverse problems with application to data compression, pattern classification and cluster analysis. IEEE Trans. Inform. Theory 36 (1990), 23-30.   CrossRef
  39. S. Kullback: Information Theory and Statistics. John Wiley and Sons, New York 1959.   CrossRef
  40. S. Kullback and R. A. Leibler: On information and sufficiency. Ann. Math. Statist. 22 (1951), 79-86.   CrossRef
  41. C. Léonard: Minimizers of energy functionals. Acta Math. Hungar. 93 (2001), 281-325.   CrossRef
  42. C. Léonard: Minimizers of energy functionals under not very integrable constraints. J. Convex Anal. 10 (2003), 63-68.   CrossRef
  43. C. Léonard: Minimization of entropy functionals. J. Math. Anal. Appl. 346 (2008), 183-204.   CrossRef
  44. C. Léonard: Entropic projections and dominating points. ESAIM: Probability and Statistics 14 (2010), 343-381.   CrossRef
  45. F. Liese and I. Vajda: Convex Statistical Distances. Teubner Texte zur Mathematik 95, Teubner Verlag, Leipzig 1986.   CrossRef
  46. N. Murata, T. Takenouchi, T. Kanamori and S. Eguchi: Information geometry of U-Boost and Bregman divergence. Neural Computation 16 (2004), 1437-1481.   CrossRef
  47. R. T. Rockafellar: Integrals which are convex functionals. Pacific J. Math. 24 (1968), 525-539.   CrossRef
  48. R. T. Rockafellar: Convex integral functionals and duality. In: Contributions to Nonlinear Functional Analysis (E. H. Zarantonello, ed.), Academic Press, New York 1971, pp. 215-236.   CrossRef
  49. R. T. Rockafellar: Convex Analysis. Princeton University Press, Princeton 1970.   CrossRef
  50. R. T. Rockafellar and R. J.-B. Wets: Variational Analysis. Springer Verlag, Berlin - Heidel\-berg - New York 2004.   CrossRef
  51. M. Teboulle and I. Vajda: Convergence of best $\phi$-entropy estimates. IEEE Trans. Inform. Theory 39 (1993), 297-301.   CrossRef
  52. F. Topsoe: Information-theoretical optimization techniques. Kybernetika 15 (1979), 8-27.   CrossRef
  53. I. Vajda: Theory of Statistical Inference and Information. Kluwer Academic Puplishers, Dordrecht 1989.   CrossRef