Kybernetika 56 no. 3, 459-499, 2020

Multi-variate correlation and mixtures of product measures

Tim AustinDOI: 10.14736/kyb-2020-3-0459


Total correlation (`TC') and dual total correlation (`DTC') are two classical ways to quantify the correlation among an $n$-tuple of random variables. They both reduce to mutual information when $n=2$. The first part of this paper sets up the theory of TC and DTC for general random variables, not necessarily finite-valued. This generality has not been exposed in the literature before. The second part considers the structural implications when a joint distribution $\mu$ has small TC or DTC. If $\mathrm{TC}(\mu) = o(n)$, then $\mu$ is close to a product measure according to a suitable transportation metric: this follows directly from Marton's classical transportation-entropy inequality. If $\mathrm{DTC}(\mu) = o(n)$, then the structural consequence is more complicated: $\mu$ is a mixture of a controlled number of terms, most of them close to product measures in the transportation metric. This is the main new result of the paper.


total correlation, dual total correlation, transportation inequalities, mixtures of products


60B99, 60G99, 62B10, 94A17


  1. S. A. Abdallah and M. D. Plumbley: Predictive Information, Multi-Information and Binding Information. Technical Report C4DM-TR-10-10, Queen Mary University of London, 2010.   CrossRef
  2. R. Ahlswede: An elementary proof of the strong converse theorem for the multiple-access channel. J. Combin. Inform. System Sci. 7 (1982), 3, 216-230.   CrossRef
  3. R. Ahlswede: The rate-distortion region for multiple descriptions without excess rate. IEEE Trans. Inform. Theory 31 (1985), 6, 721-726.   DOI:10.1109/tit.1985.1057102
  4. T. Austin: The structure of low-complexity {G}ibbs measures on product spaces. Ann. Probab. 47 (2019), 6, 4002-4023.   DOI:10.1214/19-aop1352
  5. T. Austin: Measure concentration and the weak {P}insker property. Publ. Math. Inst. Hautes Etudes Sci. 128 (2018), 1-119.   DOI:10.1007/s10240-018-0098-3
  6. N. Ay, E. Olbrich, N. Bertschinger and J. Jost: A unifying framework for complexity measures of finite systems. Working Paper 06-08-028, Santa Fe Institute, 2006.   CrossRef
  7. P. Balister and B. Bollobás: Projections, entropy and sumsets. Combinatorica 32 (2012), 2, 125-141.   DOI:10.1007/s00493-012-2453-1
  8. S. Chatterjee and A. Dembo: Nonlinear large deviations. Adv. Math. 299 (2016), 396-450.   DOI:10.1016/j.aim.2016.05.017
  9. F. R. K. Chung, R. L. Graham, P. Frankl and J. B. Shearer: Some intersection theorems for ordered sets and graphs. J. Combin. Theory Ser. A 43 (1986), 1, 23-37.   DOI:10.1016/0097-3165(86)90019-1
  10. A. Coja-Oghlan, F. Krzakala, W. Perkins and L. Zdeborová: Information-theoretic thresholds from the cavity method. Adv. Math. 333 (2018), 694-795.   DOI:10.1016/j.aim.2018.05.029
  11. A. Coja-Oghlan and W. Perkins: Bethe states of random factor graphs. Comm. Math. Phys. 366 (2019), 1, 173-201.   DOI:10.1007/s00220-019-03387-7
  12. T. M. Cover and J. A. Thomas: Elements of Information Theory. Second edition. Wiley-Interscience, John Wiley and Sons, Hoboken, NJ 2006.   CrossRef
  13. G. Crooks: On {M}easures of {E}ntropy and {I}nformation. Technical note.   CrossRef
  14. I. Csiszár: Sanov property, generalized {$I$}-projection and a conditional limit theorem. Ann. Probab. 12 (1984), 3, 768-793.   DOI:10.1214/aop/1176993227
  15. I. Csiszár and P. Narayan: Secrecy capacities for multiple terminals. IEEE Trans. Inform. Theory 50 (2004), 12, 3047-3061.   DOI:10.1109/tit.2004.838380
  16. A. Dembo and O. Zeitouni: Large deviations techniques and applications. Second edition. Springer-Verlag, Stochastic Modelling and Applied Probability 38, Berlin 2010.   DOI:10.1007/978-1-4612-5320-4
  17. R. L. Dobrušin: A general formulation of {S}hannon's fundamental theorem in the theory of information. Dokl. Akad. Nauk SSSR 126 (1959), 474-477.   CrossRef
  18. R. Dougherty, C. Freiling and K. Zeger: Networks, matroids, and non-{S}hannon information inequalities. IEEE Trans. Inform. Theory 53 (2007), 6, 1949-1969.   DOI:10.1109/tit.2007.896862
  19. R. M. Dudley: Real Analysis and Probability. Cambridge University Press, Cambridge Studies in Advanced Mathematics 74, Cambridge 2002.   DOI:10.1017/cbo9780511755347
  20. G. Dueck: The strong converse of the coding theorem for the multiple-access channel. J. Combin. Inform. System Sci. 6 (1981), 3, 187-196.   CrossRef
  21. R. Eldan: Gaussian-width gradient complexity, reverse log-sobolev inequalities and nonlinear large deviations. Geometr. Funct. Anal. 28 (2018), 6, 1548-1596.   CrossRef
  22. R. Eldan and R. Gross: Exponential random graphs behave like mixtures of stochastic block models. Preprint, available online at 1707.01227
  23. R. Eldan and R. Gross: Decomposition of mean-field {G}ibbs distributions into product measures. Electron. J. Probab. 23 (2018), 35, 24.   CrossRef
  24. D. Ellis, E. Friedgut, G. Kindler and A. Yehudayoff: Geometric stability via information theory. Discrete Anal. 10 (2016), 28 pp.   DOI:10.19086/da.784
  25. T. Fritz and R. Chaves: Entropic inequalities and marginal problems. IEEE Trans. Inform. Theory 59 (2013), 2, 803-817.   DOI:10.1109/tit.2012.2222863
  26. S. Fujishige: Polymatroidal dependence structure of a set of random variables. Inform. Control 39 (1978), 1, 55-72.   DOI:10.1016/s0019-9958(78)91063-x
  27. I. Gelfand, A. Kolmogorov and I. Yaglom: On the general definition of the quantity of information. Doklady Akad. Nauk SSSR 111 (1956), 4, 745-748.   CrossRef
  28. T. S. Han: Linear dependence structure of the entropy space. Inform. Control 29 (1975), 4, 337-368.   DOI:10.1016/s0019-9958(75)80004-0
  29. T. S. Han: Nonnegative entropy measures of multivariate symmetric correlations. Inform. Control 36 (1978), 2, 133-156.   DOI:10.1016/s0019-9958(78)90275-9
  30. A. N. Kolmogorov: On the shannon theory of information transmission in the case of continuous signals. IEEE Trans. Inform. Theory IT-2 (1956), 102-108.   DOI:10.1016/s0019-9958(78)90275-9
  31. M. Ledoux: On {T}alagrand's deviation inequalities for product measures. ESAIM Probab. Statist. 1 (1995/97), 63-87.   DOI:10.1051/ps:1997103
  32. M. Madiman and P. Tetali: Information inequalities for joint distributions, with interpretations and applications. IEEE Trans. Inform. Theory 56 (2010), 6, 2699-2713.   DOI:10.1109/tit.2010.2046253
  33. K. Makarychev, Y. Makarychev, A. Romashchenko and N. Vereshchagin: A new class of non-{S}hannon-type inequalities for entropies. Commun. Inf. Syst. 2 (2002), 2, 147-165.   DOI:10.4310/cis.2002.v2.n2.a3
  34. K. Marton: A simple proof of the blowing-up lemma. IEEE Trans. Inform. Theory 32 (1986), 3, 445-446.   DOI:10.1109/tit.1986.1057176
  35. K. Marton: Bounding {$\overline d$}-distance by informational divergence: a method to prove measure concentration. Ann. Probab. 24 (1996), 2, 857-866.   DOI:10.1214/aop/1039639365
  36. F. Matúš: Two constructions on limits of entropy functions. IEEE Trans. Inform. Theory 53 (2007), 1, 320-330.   DOI:10.1109/tit.2006.887090
  37. C. McDiarmid: On the method of bounded differences. In: Surveys in Combinatorics, Norwich 1989, London Math. Soc. Lecture Note Ser. 141, Cambridge Univ. Press, Cambridge 1989, pp. 148-188.   DOI:10.1017/cbo9781107359949.008
  38. W. J. McGill: Multivariate information transmission. Trans. I.R.E. PGIT-4 (1954), 93-111.   DOI:10.1109/tit.1954.1057469
  39. J. Pearl and A. Paz: Graphoids: a graph-based logic for reasoning about relevance relations. In: Advances in Artificial Intelligence - II (B. Du Boulay, D. Hogg, and L. Steels, eds.), North Holland, Amsterdam 1987, pp. 357-363.   CrossRef
  40. A. Perez: Information theory with an abstract alphabet. Generalized forms of {M}c{M}illan's limit theorem for the case of discrete and continuous times. Theory Probab. Appl. 4 (1959), 99-102.   DOI:10.1137/1104007
  41. A. Perez: $\epsilon $-admissible simplifications of the dependence structure of a set of random variables. Kybernetika 13 (1977), 6, 439-449.   CrossRef
  42. M. S. Pinsker: Information and information Stability of Random Variables and Processes. Holden-Day, Inc., San Francisco 1964.   CrossRef
  43. J. Radhakrishnan: Entropy and counting. In: Computational Mathematics, Modelling and Applications (IIT Kharagpur, Golden Jubilee Volume) (J. Mishra, ed.), Narosa Publishers, 2001, pp. 146-168.   CrossRef
  44. E. Schneidman, S. Still, M. J. Berry and W. Bialek: Network information and connected correlations. Phys. Rev. Lett. 91 (2003), 238701.   DOI:10.1103/physrevlett.91.238701
  45. M. Studený and J. Vejnarová: The multiinformation function as a tool for measuring stochastic dependence. In: Proc. NATO Advanced Study Institute on Learning in Graphical Models, Kluwer Academic Publishers, Norwell 1998, pp. 261-297.   DOI:10.1007/978-94-011-5014-9\_10
  46. N. Timme, W. Alford, B. Flecker and J. M. Beggs: Synergy, redundancy, and multivariate information measures: An experimentalist's perspective. J. Comput. Neurosci. 36 (2014), 2, 119-140.   DOI:10.1007/s10827-013-0458-4
  47. S. Watanabe: Information theoretical analysis of multivariate correlation. IBM J. Res. Develop. 4 (1960), 66-82.   DOI:10.1147/rd.41.0066
  48. J. Yan: Nonlinear large deviations: beyond the hypercube. Preprint, available online at 1703.08887
  49. Z. Zhang and R. W. Yeung: On characterization of entropy function via information inequalities. IEEE Trans. Inform. Theory 44 (1998), 4, 1440-1452.   DOI:10.1109/18.681320