Kybernetika 49 no. 2, 199-215, 2013

Optimally approximating exponential families

Johannes Rauh


This article studies exponential families $\mathcal{E}$ on finite sets such that the information divergence $D(P\|\mathcal{E})$ of an arbitrary probability distribution from $\mathcal{E}$ is bounded by some constant $D>0$. A particular class of low-dimensional exponential families that have low values of $D$ can be obtained from partitions of the state space. The main results concern optimality properties of these partition exponential families. The case where $D=\log(2)$ is studied in detail. This case is special, because if $D<\log(2)$, then $\mathcal{E}$ contains all probability measures with full support.


exponential family, information divergence


94A15, 62B10, 94A17


  1. N. Ay: An information-geometric approach to a theory of pragmatic structuring. Ann. Probab. 30 (2002), 416-436.   CrossRef
  2. N. Ay: Locality of global stochastic interaction in directed acyclic networks. Neural Computat. 14 (2002), 2959-2980.   CrossRef
  3. L. Brown: Fundamentals of Statistical Exponential Families: With Applications in Statistical Decision Theory. Institute of Mathematical Statistics, Hayworth 1986.   CrossRef
  4. T. Cover and J. Thomas: Elements of Information Theory. First edition. Wiley, 1991.   CrossRef
  5. I. Csiszár and P. Shields: Information Theory and Statistics: A Tutorial. First edition. Foundations and Trends in Communications and Information Theory. Now Publishers, 2004.   CrossRef
  6. I. Csiszár and F. Matúš: Generalized maximum likelihood extimates for exponential families. Probab. Theory Rel. Fields 141 (2008), 213-246.   CrossRef
  7. S. Della Pietra, V. Della Pietra and J. Lafferty: Inducing features of random fields. IEEE Trans. Pattern Analysis and Machine Intelligence 19 (1997), 380-393.   CrossRef
  8. M. Drton, B. Sturmfels and S. Sullivant: Lectures on algebraic statistics. In: Oberwolfach Seminars 39, Birkh{ä}user, Basel 2009.   CrossRef
  9. D. Geiger, C. Meek and B. Sturmfels: On the toric algebra of graphical models. Ann. Statist. 34 (2006), 5, 1463-1492.   CrossRef
  10. E. T. Jaynes: Information theory and statistical mechanics. Phys. Rev. 106 (1957), 4, 620-630.   CrossRef
  11. J. Juríček: Maximization of information divergence from multinomial distributions. Acta Univ. Carolin. 52 (2011), 1, 27-35.   CrossRef
  12. S. L. Lauritzen: Graphical Models. First edition. Oxford Statistical Science Series, Oxford University Press, 1996.   CrossRef
  13. R. Linsker: Self-organization in a perceptual network. IEEE Computer 21 (1988), 105-117.   CrossRef
  14. F. Matúš and N. Ay: On maximization of the information divergence from an exponential family. In: Proc. WUPES'03, University of Economics, Prague 2003, pp. 199-204.   CrossRef
  15. F. Matúš and J. Rauh: Maximization of the information divergence from an exponential family and criticality. In: 2011 IEEE International Symposium on Information Theory Proceedings (ISIT2011), 2011.   CrossRef
  16. G. Montúfar, J. Rauh and N. Ay: Expressive power and approximation errors of Restricted Boltzmann Machines. In: NIPS, 2011.   CrossRef
  17. J. Oxley: Matroid Theory. First edition. Oxford University Press, New York 1992.   CrossRef
  18. J. Rauh: Finding the Maximizers of the Information Divergence from an Exponential Family. Ph.D. Dissertation, Universität Leipzig, 2011.   CrossRef
  19. J. Rauh: Finding the maximizers of the information divergence from an exponential family. IEEE Trans. Inform. Theory 57 (2011), 6, 3236-3247.   CrossRef
  20. J. Rauh, T. Kahle and N. Ay: Support sets of exponential families and oriented matroids. Internat. J. Approx. Reasoning 52 (2011), 5, 613-626.   CrossRef
  21. S. C. Zhu, Y. N. Wu and D. Mumford: Minimax entropy principle and its application to texture modeling. Neural Computation 9 (1997), 1627-1660.   CrossRef