Kybernetika 56 no. 3, 459-499, 2020

Multi-variate correlation and mixtures of product measures

Tim AustinDOI: 10.14736/kyb-2020-3-0459

Abstract:

Total correlation (`TC') and dual total correlation (`DTC') are two classical ways to quantify the correlation among an $n$-tuple of random variables. They both reduce to mutual information when $n=2$. The first part of this paper sets up the theory of TC and DTC for general random variables, not necessarily finite-valued. This generality has not been exposed in the literature before. The second part considers the structural implications when a joint distribution $\mu$ has small TC or DTC. If $\mathrm{TC}(\mu) = o(n)$, then $\mu$ is close to a product measure according to a suitable transportation metric: this follows directly from Marton's classical transportation-entropy inequality. If $\mathrm{DTC}(\mu) = o(n)$, then the structural consequence is more complicated: $\mu$ is a mixture of a controlled number of terms, most of them close to product measures in the transportation metric. This is the main new result of the paper.

Keywords:

total correlation, dual total correlation, transportation inequalities, mixtures of products

Classification:

60B99, 60G99, 62B10, 94A17

References:

  1. S. A. Abdallah and M. D. Plumbley: Predictive Information, Multi-Information and Binding Information. Technical Report C4DM-TR-10-10, Queen Mary University of London, 2010.   CrossRef
  2. R. Ahlswede: An elementary proof of the strong converse theorem for the multiple-access channel. J. Combin. Inform. System Sci. 7 (1982), 3, 216-230.   CrossRef
  3. R. Ahlswede: The rate-distortion region for multiple descriptions without excess rate. IEEE Trans. Inform. Theory 31 (1985), 6, 721-726.   DOI:10.1109/tit.1985.1057102
  4. T. Austin: The structure of low-complexity {G}ibbs measures on product spaces. Ann. Probab. 47 (2019), 6, 4002-4023.   DOI:10.1214/19-aop1352
  5. T. Austin: Measure concentration and the weak {P}insker property. Publ. Math. Inst. Hautes Etudes Sci. 128 (2018), 1-119.   DOI:10.1007/s10240-018-0098-3
  6. N. Ay, E. Olbrich, N. Bertschinger and J. Jost: A unifying framework for complexity measures of finite systems. Working Paper 06-08-028, Santa Fe Institute, 2006.   CrossRef
  7. P. Balister and B. Bollobás: Projections, entropy and sumsets. Combinatorica 32 (2012), 2, 125-141.   DOI:10.1007/s00493-012-2453-1
  8. S. Chatterjee and A. Dembo: Nonlinear large deviations. Adv. Math. 299 (2016), 396-450.   DOI:10.1016/j.aim.2016.05.017
  9. F. R. K. Chung, R. L. Graham, P. Frankl and J. B. Shearer: Some intersection theorems for ordered sets and graphs. J. Combin. Theory Ser. A 43 (1986), 1, 23-37.   DOI:10.1016/0097-3165(86)90019-1
  10. A. Coja-Oghlan, F. Krzakala, W. Perkins and L. Zdeborová: Information-theoretic thresholds from the cavity method. Adv. Math. 333 (2018), 694-795.   DOI:10.1016/j.aim.2018.05.029
  11. A. Coja-Oghlan and W. Perkins: Bethe states of random factor graphs. Comm. Math. Phys. 366 (2019), 1, 173-201.   DOI:10.1007/s00220-019-03387-7
  12. T. M. Cover and J. A. Thomas: Elements of Information Theory. Second edition. Wiley-Interscience, John Wiley and Sons, Hoboken, NJ 2006.   CrossRef
  13. G. Crooks: On {M}easures of {E}ntropy and {I}nformation. Technical note.   CrossRef
  14. I. Csiszár: Sanov property, generalized {$I$}-projection and a conditional limit theorem. Ann. Probab. 12 (1984), 3, 768-793.   DOI:10.1214/aop/1176993227
  15. I. Csiszár and P. Narayan: Secrecy capacities for multiple terminals. IEEE Trans. Inform. Theory 50 (2004), 12, 3047-3061.   DOI:10.1109/tit.2004.838380
  16. A. Dembo and O. Zeitouni: Large deviations techniques and applications. Second edition. Springer-Verlag, Stochastic Modelling and Applied Probability 38, Berlin 2010.   DOI:10.1007/978-1-4612-5320-4
  17. R. L. Dobrušin: A general formulation of {S}hannon's fundamental theorem in the theory of information. Dokl. Akad. Nauk SSSR 126 (1959), 474-477.   CrossRef
  18. R. Dougherty, C. Freiling and K. Zeger: Networks, matroids, and non-{S}hannon information inequalities. IEEE Trans. Inform. Theory 53 (2007), 6, 1949-1969.   DOI:10.1109/tit.2007.896862
  19. R. M. Dudley: Real Analysis and Probability. Cambridge University Press, Cambridge Studies in Advanced Mathematics 74, Cambridge 2002.   DOI:10.1017/cbo9780511755347
  20. G. Dueck: The strong converse of the coding theorem for the multiple-access channel. J. Combin. Inform. System Sci. 6 (1981), 3, 187-196.   CrossRef
  21. R. Eldan: Gaussian-width gradient complexity, reverse log-sobolev inequalities and nonlinear large deviations. Geometr. Funct. Anal. 28 (2018), 6, 1548-1596.   CrossRef
  22. R. Eldan and R. Gross: Exponential random graphs behave like mixtures of stochastic block models. Preprint, available online at   arXiv.org: 1707.01227
  23. R. Eldan and R. Gross: Decomposition of mean-field {G}ibbs distributions into product measures. Electron. J. Probab. 23 (2018), 35, 24.   CrossRef
  24. D. Ellis, E. Friedgut, G. Kindler and A. Yehudayoff: Geometric stability via information theory. Discrete Anal. 10 (2016), 28 pp.   DOI:10.19086/da.784
  25. T. Fritz and R. Chaves: Entropic inequalities and marginal problems. IEEE Trans. Inform. Theory 59 (2013), 2, 803-817.   DOI:10.1109/tit.2012.2222863
  26. S. Fujishige: Polymatroidal dependence structure of a set of random variables. Inform. Control 39 (1978), 1, 55-72.   DOI:10.1016/s0019-9958(78)91063-x
  27. I. Gelfand, A. Kolmogorov and I. Yaglom: On the general definition of the quantity of information. Doklady Akad. Nauk SSSR 111 (1956), 4, 745-748.   CrossRef
  28. T. S. Han: Linear dependence structure of the entropy space. Inform. Control 29 (1975), 4, 337-368.   DOI:10.1016/s0019-9958(75)80004-0
  29. T. S. Han: Nonnegative entropy measures of multivariate symmetric correlations. Inform. Control 36 (1978), 2, 133-156.   DOI:10.1016/s0019-9958(78)90275-9
  30. A. N. Kolmogorov: On the shannon theory of information transmission in the case of continuous signals. IEEE Trans. Inform. Theory IT-2 (1956), 102-108.   DOI:10.1016/s0019-9958(78)90275-9
  31. M. Ledoux: On {T}alagrand's deviation inequalities for product measures. ESAIM Probab. Statist. 1 (1995/97), 63-87.   DOI:10.1051/ps:1997103
  32. M. Madiman and P. Tetali: Information inequalities for joint distributions, with interpretations and applications. IEEE Trans. Inform. Theory 56 (2010), 6, 2699-2713.   DOI:10.1109/tit.2010.2046253
  33. K. Makarychev, Y. Makarychev, A. Romashchenko and N. Vereshchagin: A new class of non-{S}hannon-type inequalities for entropies. Commun. Inf. Syst. 2 (2002), 2, 147-165.   DOI:10.4310/cis.2002.v2.n2.a3
  34. K. Marton: A simple proof of the blowing-up lemma. IEEE Trans. Inform. Theory 32 (1986), 3, 445-446.   DOI:10.1109/tit.1986.1057176
  35. K. Marton: Bounding {$\overline d$}-distance by informational divergence: a method to prove measure concentration. Ann. Probab. 24 (1996), 2, 857-866.   DOI:10.1214/aop/1039639365
  36. F. Matúš: Two constructions on limits of entropy functions. IEEE Trans. Inform. Theory 53 (2007), 1, 320-330.   DOI:10.1109/tit.2006.887090
  37. C. McDiarmid: On the method of bounded differences. In: Surveys in Combinatorics, Norwich 1989, London Math. Soc. Lecture Note Ser. 141, Cambridge Univ. Press, Cambridge 1989, pp. 148-188.   DOI:10.1017/cbo9781107359949.008
  38. W. J. McGill: Multivariate information transmission. Trans. I.R.E. PGIT-4 (1954), 93-111.   DOI:10.1109/tit.1954.1057469
  39. J. Pearl and A. Paz: Graphoids: a graph-based logic for reasoning about relevance relations. In: Advances in Artificial Intelligence - II (B. Du Boulay, D. Hogg, and L. Steels, eds.), North Holland, Amsterdam 1987, pp. 357-363.   CrossRef
  40. A. Perez: Information theory with an abstract alphabet. Generalized forms of {M}c{M}illan's limit theorem for the case of discrete and continuous times. Theory Probab. Appl. 4 (1959), 99-102.   DOI:10.1137/1104007
  41. A. Perez: $\epsilon $-admissible simplifications of the dependence structure of a set of random variables. Kybernetika 13 (1977), 6, 439-449.   CrossRef
  42. M. S. Pinsker: Information and information Stability of Random Variables and Processes. Holden-Day, Inc., San Francisco 1964.   CrossRef
  43. J. Radhakrishnan: Entropy and counting. In: Computational Mathematics, Modelling and Applications (IIT Kharagpur, Golden Jubilee Volume) (J. Mishra, ed.), Narosa Publishers, 2001, pp. 146-168.   CrossRef
  44. E. Schneidman, S. Still, M. J. Berry and W. Bialek: Network information and connected correlations. Phys. Rev. Lett. 91 (2003), 238701.   DOI:10.1103/physrevlett.91.238701
  45. M. Studený and J. Vejnarová: The multiinformation function as a tool for measuring stochastic dependence. In: Proc. NATO Advanced Study Institute on Learning in Graphical Models, Kluwer Academic Publishers, Norwell 1998, pp. 261-297.   DOI:10.1007/978-94-011-5014-9\_10
  46. N. Timme, W. Alford, B. Flecker and J. M. Beggs: Synergy, redundancy, and multivariate information measures: An experimentalist's perspective. J. Comput. Neurosci. 36 (2014), 2, 119-140.   DOI:10.1007/s10827-013-0458-4
  47. S. Watanabe: Information theoretical analysis of multivariate correlation. IBM J. Res. Develop. 4 (1960), 66-82.   DOI:10.1147/rd.41.0066
  48. J. Yan: Nonlinear large deviations: beyond the hypercube. Preprint, available online at   arXiv.org: 1703.08887
  49. Z. Zhang and R. W. Yeung: On characterization of entropy function via information inequalities. IEEE Trans. Inform. Theory 44 (1998), 4, 1440-1452.   DOI:10.1109/18.681320