Kybernetika 49 no. 1, 23-39, 2013

Mixture decompositions of exponential families - using a decomposition of their sample spaces

Guido Montúfar


We study the problem of finding the smallest $m$ such that every element of an exponential family can be written as a mixture of $m$ elements of another exponential family. We propose an approach based on coverings and packings of the face lattice of the corresponding convex support polytopes and results from coding theory. We show that $m=q^{N-1}$ is the smallest number for which any distribution of $N$ $q$-ary variables can be written as mixture of $m$ independent $q$-ary variables. Furthermore, we show that any distribution of $N$ binary variables is a mixture of $m = 2^{N-(k+1)}(1+ 1/(2^k-1))$ elements of the $k$-interaction exponential family.


mixture model, non-negative tensor rank, perfect code, marginal polytope


52B05, 60C05, 62E17


  1. S. Amari: Information geometry on hierarchical decomposition of stochastic interactions. IEEE Trans. Inform. Theory 47 (1999), 1701-1711.   CrossRef
  2. S. Amari and H. Nagaoka: Methods of information geometry, Vol. 191. Oxford University Press, 2000.} \newblock{Translations of mathematical monographs.   CrossRef
  3. N. Ay and A. Knauf: Maximizing multi-information. Kybernetika 42 (2006), 517-538.   CrossRef
  4. N. Ay, G. F. Montúfar and J. Rauh: Selection criteria for neuromanifolds of stochastic dynamics. In: Advances in Cognitive Neurodynamics (III). Springer, 2011.   CrossRef
  5. C. M. Bishop: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, New York 2006.   CrossRef
  6. C. Bocci and L. Chiantini: On the identifiability of binary segre products. J. Algebraic Geom. 5 (2011).   CrossRef
  7. L. Brown: Fundamentals of Statistical Exponential Families: With Applications in Statistical Decision Theory. Institute of Mathematical Statistics, Hayworth 1986.   CrossRef
  8. M. V. Catalisano, A. V. Geramita and A. Gimigliano: Secant varieties of $\P^1\times\dots\times\P^1$ ($n$-times) are not defective for $n\geq5$. J. Algebraic Geom. 20 (2011), 295-327.   CrossRef
  9. P. Diaconis: Finite forms of de Finetti's theorem on exchangeability. Synthese 36 (1977), 271-281.   CrossRef
  10. B. Efron: The geometry of exponential families. Ann. Statist. 6 (1978), 2, 362-376.   CrossRef
  11. S. L. G. Cohen, I. Honkala and A. Lobstein: Covering Codes. Elsevier, 1997.   CrossRef
  12. D. Gale: Neighborly and cyclic polytopes. In: Convexity: Proc. Seventh Symposium in Pure Mathematics of the American Mathematical Society 1961, pp. 225-233.   CrossRef
  13. E. Gawrilow and M. Joswig: Polymake: a framework for analyzing convex polytopes. In: Polytopes - Combinatorics and Computation (G. Kalai and G. M. Ziegler, eds.), Birkhäuser 2000, pp. 43-74.   CrossRef
  14. D. Geiger, C. Meek and B. Sturmfels: On the toric algebra of graphical models. Ann. Statist. 34 (2006), 1463-1492.   CrossRef
  15. E. Gilbert: A comparison of signalling alphabets. Bell System Techn. J. 31 (1052), 504-522.   CrossRef
  16. Z. Gilula: Singular value decomposition of probability matrices: Probabilistic aspects of latent dichotomous variables. Biometrika 66 (1979), 2, 339-344.   CrossRef
  17. B. Gr{ü}nbaum: Convex Polytopes. Second edition. Springer-Verlag, New York 2003.   CrossRef
  18. M. Henk, J. Richter-Gebert and G. M. Ziegler: Basic Properties of Convex Polytopes. CRC Press, Boca Raton 1997.   CrossRef
  19. S. Ho{ş}ten and S. Sullivant: Gröbner bases and polyhedral geometry of reducible and cyclic models. J. Combin. Theory Ser. A 100 (2002), 2, 277-301.   CrossRef
  20. T. Kahle: Neighborliness of marginal polytopes. Contrib. Algebra Geometry 51 (2010), 45-56.   CrossRef
  21. T. Kahle and N. Ay: Support sets of distributions with given interaction structure. In: Proc. WUPES'06, 2006.   CrossRef
  22. T. Kahle, W. Wenzel and N. Ay: Hierarchical models, marginal polytopes, and linear codes. Kybernetika 45 (2009), 189-208.   CrossRef
  23. G. Kalai: Some aspects of the combinatorial theory of convex polytopes. 1993.   CrossRef
  24. J. F. C. Kingman: Uses of exchangeability. Ann. Probab. 6 (1978), 2, 183-197.   CrossRef
  25. B. G. Lindsay: Mixture models: theory, geometry, and applications. NSF-CBMS Regional Conference Series in Probability and Statistics. Institute of Mathematical Statistics, 1995.   CrossRef
  26. G. McLachlan and D. Peel: Finite Mixture Models. Wiley Series in Probability and Statistics: Applied Probability and Statistics. Wiley, 2000.   CrossRef
  27. G. F. Montúfar and N. Ay: Refinements of universal approximation results for deep belief networks and restricted Boltzmann machines. Neural Comput. 23 (2011), 5, 1306-1319.   CrossRef
  28. G. F. Montúfar, J. Rauh and N. Ay: Expressive power and approximation errors of restricted Boltzmann machines. In: Advances in Neural Information Processing Systems 24 (J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, eds.), MIT Press, 2011, pp. 415-423.   CrossRef
  29. J. Rauh: Finding the Maximizers of the Information Divergence from an Exponential Family. Ph. D. Thesis, Universität Leipzig, 2011.   CrossRef
  30. J. Rauh, T. Kahle and N. Ay: Support sets of exponential families and oriented matroids. Internat. J. Approximate Reasoning 52 (2011), 5, 613-626.   CrossRef
  31. R. Settimi and J. Q. Smith: On the geometry of Bayesian graphical models with hidden variables. In: Proc. Fourteenth conference on Uncertainty in artificial intelligence, UAI'98, Morgan Kaufmann Publishers 1998, pp. 472-479.   CrossRef
  32. I. Shemer.: Neighborly polytopes. Israel J. Math. 43 (1982), 291-311.   CrossRef
  33. D. Titterington, A. F. M. Smith and U. E. Makov: Statistical Analysis of Finite Mixture Distributions. John Wiley and Sons, 1985.   CrossRef
  34. R. Varshamov: Estimate of the number of signals in error correcting codes. Dokl. Akad. Nauk SSSR 117 (1957), 739-741.   CrossRef