Kybernetika 50 no. 2, 234-245, 2014

Scaling of model approximation errors and expected entropy distances

Guido F. Montúfar and Johannes RauhDOI: 10.14736/kyb-2014-2-0234


We compute the expected value of the Kullback-Leibler divergence of various fundamental statistical models with respect to Dirichlet priors. For the uniform prior, the expected divergence of any model containing the uniform distribution is bounded by a constant $1-\gamma$. For the models that we consider this bound is approached as the cardinality of the sample space tends to infinity, if the model dimension remains relatively small. For Dirichlet priors with reasonable concentration parameters the expected values of the divergence behave in a similar way. These results serve as a reference to rank the approximation capabilities of other statistical models.


exponential families, KL divergence, MLE, Dirichlet prior


62F25, 68T30


  1. N. Ay: An information-geometric approach to a theory of pragmatic structuring. Ann. Probab. 30 (2002), 416-436.   CrossRef
  2. M. Drton, B. Sturmfels and S. Sullivant: Lectures on Algebraic Statistics. Birkhäuser, Basel 2009.   CrossRef
  3. B. A. Frigyik, A. Kapila and M. R. Gupta: Introduction to the {D}irichlet Distribution and Related Processes. Technical Report, Department of Electrical Engineering University of Washington, 2010.   CrossRef
  4. F. Matúš and N. Ay: On maximization of the information divergence from an exponential family. In: Proc. WUPES'03, University of Economics, Prague 2003, pp. 199-204.   CrossRef
  5. F. Matúš and J. Rauh: Maximization of the information divergence from an exponential family and criticality. In: Proc. ISIT, St. Petersburg 2011, pp. 903-907.   CrossRef
  6. G. Montúfar, J. Rauh and N. Ay: Expressive power and approximation errors of restricted {B}oltzmann machines. In: Advances in NIPS 24, MIT Press, Cambridge 2011, pp. 415-423.   CrossRef
  7. I. Nemenman, F. Shafee and W. Bialek: Entropy and inference, revisited. In: Advances in NIPS 14, MIT Press, Cambridge 2001, pp. 471-478.   CrossRef
  8. J. Rauh: Finding the Maximizers of the Information Divergence from an Exponential Family. Ph.D. Thesis, Universität Leipzig 2011.   CrossRef
  9. J. Rauh: Optimally approximating exponential families. Kybernetika 49 (2013), 199-215.   CrossRef
  10. D. Wolpert and D. Wolf: Estimating functions of probability distributions from a finite set of samples. Phys, Rev. E 52 (1995), 6841-6854.   CrossRef