Kybernetika 56 no. 5, 948-978, 2020

Factorized mutual information maximization

Thomas Merkh and Guido MontúfarDOI: 10.14736/kyb-2020-5-0948


We investigate the sets of joint probability distributions that maximize the average multi-information over a collection of margins. These functionals serve as proxies for maximizing the multi-information of a set of variables or the mutual information of two subsets of variables, at a lower computation and estimation complexity. We describe the maximizers and their relations to the maximizers of the multi-information and the mutual information.


multi-information, mutual information, divergence maximization, marginal specification problem, transportation polytope


94A17, 62B10


  1. A. Alemi, I. Fischer, J. Dillon and K. Murphy: Deep variational information bottleneck. In: {ICLR}, 2017.   CrossRef
  2. N. Ay: An information-geometric approach to a theory of pragmatic structuring. Ann. Probab. 30 (2002), 1, 416-436.   DOI:10.1214/aop/1020107773
  3. N. Ay: Locality of global stochastic interaction in directed acyclic networks. Neural Comput. 14 (2002), 12, 2959-2980.   DOI:10.1162/089976602760805368
  4. N. Ay, N. Bertschinger, R. Der, F. Güttler and E. Olbrich: Predictive information and explorative behavior of autonomous robots. Europ. Phys. J. B 63 (2008), 3, 329-339.   DOI:10.1140/epjb/e2008-00175-0
  5. N. Ay and A. Knauf: Maximizing multi-information. Kybernetika 42 (2006), 5, 517-538.   CrossRef
  6. G. Baldassarre and M. Mirolli: Intrinsically motivated learning systems: an overview. In: Intrinsically motivated learning in natural and artificial systems, Springer 2013, pp. 1-14.   DOI:10.1007/978-3-642-32375-1\_1
  7. P. Baudot, M. Tapia, D. Bennequin and J.-M. Goaillard: Topological information data analysis. Entropy 21 (2019), 9, 869.   DOI:10.3390/e21090869
  8. R. Bekkerman, M. Sahami and E. Learned-Miller: Combinatorial markov random fields. In: European Conference on Machine Learning, Springer 2006, pp. 30-41.   DOI:10.1007/11871842\_8
  9. M. I. Belghazi, A. Baratin, S. Rajeshwar, S. Ozair, Y. Bengio, A. Courville and D. Hjelm: Mutual information neural estimation. In: Proc. 35th International Conference on Machine Learning (J. Dy and A. Krause, eds.), Vol. 80 of {\em Proceedings of Machine Learning Research}, pp. 531-540, Stockholm 2018. PMLR.   CrossRef
  10. N. Bertschinger, J. Rauh, E. Olbrich, J. Jost and N. Ay: Quantifying unique information. Entropy 16 (2014), 4, 2161-2183.   DOI:10.3390/e16042161
  11. W. Bialek, I. Nemenman and N. Tishby: Predictability, complexity, and learning. Neural Comput. 13 (2001), 11, 2409-2463.   DOI:10.1162/089976601753195969
  12. Y. Burda, H. Edwards, D. Pathak, A. Storkey, T. Darrell and A. A. Efros: Large-scale study of curiosity-driven learning. In: ICLR, 2019.   CrossRef
  13. J. Buzzi and L. Zambotti: Approximate maximizers of intricacy functionals. Probab. Theory Related Fields 153 (2012), 3-4, 421-440.   DOI:10.1007/s00440-011-0350-y
  14. N. Chentanez, A. G. Barto and S. P. Singh: Intrinsically motivated reinforcement learning. In: Adv. Neural Inform. Process. Systems 2005, pp. 1281-1288.   DOI:10.21236/ada440280
  15. J. P. Crutchfield and D. P. Feldman: Synchronizing to the environment: Information-theoretic constraints on agent learning. Adv. Complex Systems 4 (2001), 02n03, 251-264.   DOI:10.1142/s021952590100019x
  16. J. de Loera: Transportation polytopes.
  17. N. Friedman, O. Mosenzon, N. Slonim and N. Tishby: Multivariate information bottleneck. In: Proc. Seventeenth conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc., 2001, pp. 152-161.   CrossRef
  18. M. Gabrié, A. Manoel, C. Luneau, j. Barbier, N. Macris, F. Krzakala and L. Zdeborová: Entropy and mutual information in models of deep neural networks. In: Advances in Neural Information Processing Systems 31 (S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, eds.), Curran Associates, Inc. 2018, pp. 1821-1831.   CrossRef
  19. S. Gao, G. Ver Steeg and A. Galstyan: Efficient estimation of mutual information for strongly dependent variables. In: Artificial Intelligence and Statistics 2015, pp. 277-286.   CrossRef
  20. R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, Y. Bengio. \newblock Learning deep representations by mutual information estimation, maximization. \newblock In {\em International Conference on Learning Representations} and 2019.:     CrossRef
  21. S. Hosten and S. Sullivant: Gröbner bases and polyhedral geometry of reducible and cyclic models. J. Comb. Theory Ser. A 100 (2002), 2, 277-301.   DOI:10.1006/jcta.2002.3301
  22. A. Jakulin and I. Bratko: Quantifying and visualizing attribute interactions: An approach based on entropy. 2003.   CrossRef
  23. A. S. Klyubin, D. Polani and C. L. Nehaniv: Empowerment: A universal agent-centric measure of control. In: 2005 IEEE Congress on Evolutionary Computation, Vol. 1, IEEE 2005, pp. 128-135.   CrossRef
  24. A. Kraskov, H./ Stögbauer and P. Grassberger: Estimating mutual information. Phys. Rev. E 69 (2004), 6, 066138.   DOI:10.1103/physreve.69.066138
  25. F. Matúš: Maximization of information divergences from binary i.i.d. sequences. In: Proc. IPMU 2004 2 (2004), pp. 1303-1306.   CrossRef
  26. F. Matúš: Divergence from factorizable distributions and matroid representations by partitions. IEEE Trans. Inf. Theor. 55 (2009), 12, 5375-5381.   DOI:10.1109/tit.2009.2032806
  27. F. Matúš and N. Ay: On maximization of the information divergence from an exponential family. In: Proc. 6th Workshop on Uncertainty Processing: Oeconomica 2003, Hejnice 2003, pp. 199-204.   CrossRef
  28. F. Matúš and J. Rauh: Maximization of the information divergence from an exponential family and criticality. In: 2011 IEEE International Symposium on Information Theory Proceedings 2011, pp. 903-907.   DOI:10.1109/isit.2011.6034269
  29. W. McGill: Multivariate information transmission. Trans. IRE Profess. Group Inform. Theory 4 (1054), 4, 93-111.   DOI:10.1109/tit.1954.1057469
  30. S. Mohamed and D. J. Rezende: Variational information maximisation for intrinsically motivated reinforcement learning. In: Advances in Neural Information Processing Systems 2015, 2125-2133, 2015.   CrossRef
  31. G. Montúfar: Universal approximation depth and errors of narrow belief networks with discrete units. Neural Comput. 26 (2014), 7, 1386-1407.   DOI:10.1162/neco\_a\_00601
  32. G. Montúfar, K. Ghazi-Zahedi and N. Ay: A theory of cheap control in embodied systems. PLOS Comput. Biology 11 (2015), 9, 1-22.   DOI:10.1371/journal.pcbi.1004427
  33. G. Montúfar, K. Ghazi-Zahedi and N. Ay: Information theoretically aided reinforcement learning for embodied agents. arXiv preprint arXiv:1605.09735, 2016.   CrossRef
  34. G. Montúfar, J. Rauh and N. Ay: Expressive power and approximation errors of restricted {B}oltzmann machines. In: Advances in Neural Information Processing Systems 2011, pp. 415-423.   CrossRef
  35. G. Montúfar, J. Rauh and N. Ay: Maximal information divergence from statistical models defined by neural networks. In: Geometric Science of Information GSI 2013 (F. Nielsen and F. Barbaresco, eds.), Lecture Notes in Computer Science 3085 Springer 2013, pp. 759-766.   DOI:10.1007/978-3-642-40020-9\_85
  36. J. Rauh: Finding the maximizers of the information divergence from an exponential family. IEEE Trans. Inform. Theory 57 (2011), 6, 3236-3247.   DOI:10.1109/tit.2011.2136230
  37. J. Rauh: Finding the Maximizers of the Information Divergence from an Exponential Family. PhD. Thesis, Universität Leipzig 2011.   CrossRef
  38. R. A. A. Ince, S. Panzeri, S. R. Schultz. \newblock {\em Summary of Information Theoretic Quantities}, pages 1-6. \newblock Springer New York, New York, NY and 2013.:     CrossRef
  39. M. S. Roulston: Estimating the errors on measured entropy and mutual information. Physica D: Nonlinear Phenomena 125 (1999), 3-4, 285-294.   DOI:10.1016/s0167-2789(98)00269-3
  40. J. Schossau, C. Adami and A. Hintze: Information-theoretic neuro-correlates boost evolution of cognitive systems. Entropy 18 (2015), 1, 6.   DOI:10.3390/e18010006
  41. N. Slonim, G. S. Atwal, G. Tkacik and W. Bialek: Estimating mutual information and multi-information in large networks. arXiv preprint cs/0502017, 2005.   CrossRef
  42. N. Slonim, N. Friedman and N. Tishby: Multivariate information bottleneck. Neural Comput. 18 (2006), 8, 1739-1789.   DOI:10.1162/neco.2006.18.8.1739
  43. S. Still and D. Precup: An information-theoretic approach to curiosity-driven reinforcement learning. Theory Biosci. 131 (2012), 3, 139-148.   DOI:10.1007/s12064-011-0142-z
  44. The Sage Developers: {S}ageMath, the {S}age {M}athematics {S}oftware {S}ystem ({V}ersion 8.7), 2019.   CrossRef
  45. N. Tishby, F. C. Pereira and W. Bialek: The information bottleneck method. In: Proc. 37th Annual Allerton Conference on Communication, Control and Computing 1999, pp. 368-377.   CrossRef
  46. J. R. Vergara and P. A. Estévez: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24 (2014), 1, 175-186.   DOI:10.1007/s00521-013-1368-0
  47. S. Watanabe: Information theoretical analysis of multivariate correlation. IBM J. Res. Develop. 4 (1960), 1, 66-82.   DOI:10.1147/rd.41.0066
  48. H. S. Witsenhausen and A. D. Wyner: A conditional entropy bound for a pair of discrete random variables. IEEE Trans. Inform. Theory 21 (1075), 5, 493-501.   DOI:10.1109/tit.1975.1055437
  49. V. Yemelichev, M. Kovalev and M. Kravtsov: Polytopes, Graphs and Optimisation. Cambridge University Press, 1984.   CrossRef
  50. K. Zahedi, N. Ay and R. Der: Higher coordination with less control: A result of information maximization in the sensorimotor loop. Adaptive Behavior 18 (2010), 3-4, 338-355.   DOI:10.1177/1059712310375314
  51. K. Zahedi, G. Martius and N. Ay: Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis. Front. Psychol. (2013), 4, 801.   DOI:10.3389/fpsyg.2013.00801