A Construction of Bayesian Networks from Databases Based on an MDL Principle

Published 6 Mar 2013 in cs.AI | (1303.1486v1)

Abstract: This paper addresses learning stochastic rules especially on an inter-attribute relation based on a Minimum Description Length (MDL) principle with a finite number of examples, assuming an application to the design of intelligent relational database systems. The stochastic rule in this paper consists of a model giving the structure like the dependencies of a Bayesian Belief Network (BBN) and some stochastic parameters each indicating a conditional probability of an attribute value given the state determined by the other attributes' values in the same record. Especially, we propose the extended version of the algorithm of Chow and Liu in that our learning algorithm selects the model in the range where the dependencies among the attributes are represented by some general plural number of trees.

Abstract PDF Upgrade to Chat

Citations (171)

View on Semantic Scholar

Summary

A Construction of Bayesian Networks from Databases Based on an MDL Principle

The paper by Joe Suzuki proposes an innovative approach for constructing Bayesian Networks (BNs) from databases, resting on the Minimum Description Length (MDL) principle. This research specifically targets the learning of stochastic rules encapsulating inter-attribute dependencies within intelligent relational database systems, addressing the challenge of inferring missing attribute values utilizing available data-driven rules.

A principal contribution of the paper is the extension of the algorithmic framework originally introduced by Chow and Liu, thereby enabling the model selection to operate over a more generalized space where dependencies among attributes are represented by multiple trees, as opposed to a single spanning tree. This methodological enhancement accommodates a broader range of structural dependencies, aiming to achieve robust learning from finite example datasets.

In the proposed framework, stochastic rules are learned through the estimation of probability distributions for each R-dimensional attribute vector, modeled as a Bayesian belief network (BBN) and specified by stochastic parameters that denote conditional probabilities. The utilization of the MDL principle in this context prioritizes models that strike an optimal balance between complexity and data fit—achieved by minimizing the description length of both models and example sets. Such an approach eschews prior probability distributions on model parameters, instead encoding efficiency by leveraging minimax redundancy concepts to ensure robust information-theoretic correctness.

Key numerical results include the derivation of a simple description length formula, pivotal for comparing models under the MDL framework:

$l(x_R[n]) = H(x_R[n] \mid g) + \frac{k(g) \log n}{2}$

where $H(x_R[n] \mid g)$ is the empirical entropy of the examples given model $g$ , and $k(g)$ represents the number of stochastic parameters.

This paper is significant not only for its methodological contributions but also for its implications in database design and management. The ability to automatically learn and integrate stochastic inter-attribute dependencies into relational databases contributes substantively to the development of intelligent database systems capable of more sophisticated value inference and reasoning.

In the context of Bayesian Belief Networks, the research further explores model selection both in general and in cases constrained by tree structures. The introduction of an MDL-driven learning algorithm for trees stands as a promising enhancement over conventional Bayesian methods by enabling criteria based on mutual information and description length. This capability offers a flexible yet rigorous approach to network construction without the computational burden of extensive prior probabilities.

Prospects for future research include refining the practical application of the MDL principle within database designs and exploring parameter estimation strategies that align with minimax redundancy minimization. The research potentially paves the way for advancements in how relational databases capture and utilize probabilistic dependencies, contributing to the broader domain of knowledge discovery and artificial intelligence.

Markdown Report Issue