Kybernetika 48 no. 5, 825-844, 2012

A backward selection procedure for approximating a discrete probability distribution by decomposable models

Abstract:

Decomposable (probabilistic) models are log-linear models generated by acyclic hypergraphs, and a number of nice properties enjoyed by them are known. In many applications the following selection problem naturally arises: given a probability distribution $p$ over a finite set $V$ of $n$ discrete variables and a positive integer $k$, find a decomposable model with tree-width $k$ that best fits $p$. If $\mathcal{H}$ is the generating hypergraph of a decomposable model and $p_{\mathcal{H}}$ is the estimate of $p$ under the model, we can measure the closeness of $p_{\mathcal{H}}$ to $p$ by the information divergence $D(p: p_{\mathcal{H}})$, so that the problem above reads: given $p$ and $k$, find an acyclic, connected hypergraph ${\mathcal{H}}$ of tree-width $k$ such that $D(p: p_{\mathcal{H}})$ is minimum. It is well-known that this problem is $NP$-hard. However, for $k = 1$ it was solved by Chow and Liu in a very efficient way; thus, starting from an optimal Chow--Liu solution, a few forward-selection procedures have been proposed with the aim at finding a `good' solution for an arbitrary $k$. We propose a backward-selection procedure which starts from the (trivial) optimal solution for $k=n-1$, and we show that, in a study case taken from literature, our procedure succeeds in finding an optimal solution for every $k$.

Keywords:

decomposable model, acyclic hypergraph, backward selection, information divergence, $k$-hypertree

Classification:

05C65, 62-09, 68R10, 68T05

paper.pdf

References:

R. Almond and A. Kong: Optimality Issues in Constructing a Markov Tree from Graphical Models. Research Report A-3, Dept. Statistics, Harvard University, 1991. CrossRef
A. Altmüller and R. M. Haralick: Approximating high dimensional probability disributions. In: Proc. XVII Int. Conf. on Patter Recognitions, 2004. CrossRef
A. Altmüller and R. M. Haralick: Practical aspects of efficient forward selection in decomposable graphical models. In: Proc. XVI IEEE Int. Conf. on Tools with Artificial Intelligence, 2004, pp. 710-715. CrossRef
F. R. Bach and M. I. Jordan: Thin junction trees. Adv. in Neural Inform. Proces. Systems 14 (2002), 569-572. CrossRef
J.-H. Badsberg and F. M. Malvestuto: An implementation of the iterative proportional fitting procedure by propagation trees. Comput. Statist. Data Anal. 37 (2001), 297-322. CrossRef
C. Beeri, R. Fagin, D. Maier and M. Yannakakis: On the desirability of acyclic database schemes. J. ACM 30 (1983), 479-513. CrossRef
L. W. Beineke and R. E. Pippert: The enumeration of labelled 2-trees. J. Combin. Theory 6 (1969), 200-205. CrossRef
Y. Bishop, S. E. Fienberg and P. W. Holland: Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge 1975. CrossRef
D. T. Brown: A note on approximations to discrete probability distributions. Inform. and Control 2 (1959), 386-392. CrossRef
D. Chickering: Learning Bayesian networks is NP-complete. In: Learning from Data, Lecture Notes in Statist. 112 (1996), 121-130. CrossRef
C. K. Chow and C. N. Liu: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inform. Theory 14 (1968), 462-467. CrossRef
T. M. Cover: Elements of Information Theory. John Wiley and Sons, 1991. CrossRef
I. Csiszár and J. Körner: Information Theory. Academic Press, 1981. CrossRef
P. Dagum and M. Luby: Approximating probabilistic inference in belief networks is NP-hard. Artificial Intelligence 60 (1993), 141-153. CrossRef
S. Dasgupta: Learning polytrees. In: Proc. XV Conference on Uncertainty in Artificial Intelligence, 1999, pp. 134-141. CrossRef
A. Deshpande, M. Garofalakis and M. I. Jordan: Efficient stepwise selection in decomposable models. In: Proc. XVII Conf. on Uncertainty in Artificial Intelligence, 2001, pp. 128-135. CrossRef
G. Ding, R. F. Lax, J. Chen, P. P. Chen and B. D. Marx: Comparison of greedy strategies for learning Markov networks of treewidth $k$. In: Proc. Int. Conf. on Machine Learning: Models, Technologies and Applications, 2007, pp. 294-301. CrossRef
T. Havránek: On model search methods. In: Proc. IX Symp. on Computational Statistics, 1990, pp. 101-108. CrossRef
T. Havránek: Simple formal systems in data analysis. In: Proc. Conf. on Symbolic-Numeric Data Analysis and Learning, 1991, pp. 373-381. CrossRef
F. V. Jensen and F. Jensen: Optimal junction trees. In: Proc. X Conf. on Uncertainty in Artificial Intelligence (R. L. de Mantaras and D. Poole, eds.), 1994, pp. 360-366. CrossRef
D. Karger and N. Srebro: Learning Markov networks: maximum bounded tree-width graphs. In: Proc. XII ACM-SIAM Symp. on Discrete Mathematics, 2001, pp. 392-401. CrossRef
T. Kloks: Tree-width. LNCS 842, Springer Verlag, Berlin 1994. CrossRef
T. Kocka: New algorithm for learning decomposable models. Unpublished manuscript, 2000. CrossRef
E. Kovács and T. Szántai: Vine copulas as a mean for the construction of high dimensional probability distribution associated to a Markov network. arXiv:1105.1697v1, 2011. CrossRef
H. H. Ku and S. Kullback: Approximating discrete probability distributions. IEEE Trans. Inform. Theory 15 (1969), 444-447. CrossRef
S. L. Lauritzen: Graphical Models. Clarendon Press, Oxford 1996. CrossRef
P. M. Lewis II: Approximating probability distributions to reduce storage requirements. Inform. and Control 2 (1959), 214-225. CrossRef
F. M. Malvestuto: Operations research in the design of statistical databases (in Italian). In: Proc. AIRO Meeting on Operations Research and Computer Science, 1986, pp. 117-130. CrossRef
F. M. Malvestuto: Approximating discrete probability distributions with decomposable models. IEEE Trans. Systems, Man and Cybernetics 21 (1991), 1287-1294. CrossRef
F. M. Malvestuto: An axiomatization of loglinear models with an application to the model search. In: Learning from Data, LNS 112 (1996), pp. 175-184. CrossRef
F. M. Malvestuto: Designing a probabilistic database from a given set of full conditional independences. In: Proc. Workshop on Conditional Independence Structures and Graphical Models, 1999. CrossRef
F. M. Malvestuto: A hypergraph-theoretic analysis of collapsibility and decomposability for extended log-linear models. Statist. Comput. 11 (2001), 155-169. CrossRef
F. M. Malvestuto: From conditional independences to factorization constraints with discrete random variables. Ann. Math. and Artificial Intelligence 35 (2002), 253-285. CrossRef
F. M. Malvestuto: Tree and local computations in a cross-entropy minimization problem with marginal constraints. Kybernetika 46 (2010), 621-654. CrossRef
C. Meek: Finding a path is harder than finding a tree. J. Artificial Intelligence Res. 15 (2001), 383-389. CrossRef
M. Mezzini and M. Moscarini: Simple algorithms for minimal triangulation of a graph and backward selection of a decomposable Markov network. Theoret. Comput. Sci. 411 (2010), 958-966. CrossRef
K. Nunez, J. Chen, P. Chen, G. Ding, R. F. Lax and B. Marx: Empirical comparison of greedy strategies for learning Markov networks of treewidth $k$. In: Proc. VII Int. Conf. on Machine Learning and Applications, 2008, pp. 106-113. CrossRef
J. D. Rose: On simple characterizations of $k$-trees. Discrete Math. 7 (1974), 317-322. CrossRef
T. Szántai and E. Kovács: Hypergraphs as a mean of discovering the dependence structure of a discrete multivariate probability distribution. In: Proc. Conf. on Applied Mathematical Programming and Modelling, 2008; also in Ann. Oper. Res. 193 (2012), 71-90. CrossRef
T. Szántai and E. Kovács: Discovering a junction tree behind a Markov network by a greedy algorithm. arXiv:1104.2762v3, 2011. CrossRef
R. E. Tarjan and M. Yannakakis: Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce hypergraphs. SIAM J. Comput. 13 (1984), 566-579. CrossRef
N. Wermuth: Analogies between multiplicative models in contingency tables and covariance selection. Biometrics 32 (1976), 95-108. CrossRef
N. Wermuth: Model search among multiplicative models. Biometrics 32 (1976), 253-256. CrossRef
Y. Xiang, S. K. M. Wong and N. Cercone: A ``microscopic'' study of minimum entropy search in learning decomposable Markov networks. Mach. Learning 26 (1997), 65-72. CrossRef

Kybernetika

Journal