Piecewise Training for Undirected Models

Published 4 Jul 2012 in cs.LG and stat.ML | (1207.1409v1)

Abstract: For many large undirected models that arise in real-world applications, exact maximumlikelihood training is intractable, because it requires computing marginal distributions of the model. Conditional training is even more difficult, because the partition function depends not only on the parameters, but also on the observed input, requiring repeated inference over each training example. An appealing idea for such models is to independently train a local undirected classifier over each clique, afterwards combining the learned weights into a single global model. In this paper, we show that this piecewise method can be justified as minimizing a new family of upper bounds on the log partition function. On three natural-language data sets, piecewise training is more accurate than pseudolikelihood, and often performs comparably to global training using belief propagation.

Abstract PDF Upgrade to Chat

Citations (195)

View on Semantic Scholar

Summary

Piecewise Training for Undirected Models: An Evaluation and Justification

This paper addresses the challenge of parameter estimation in large undirected graphical models, such as Markov Random Fields (MRFs) and Conditional Random Fields (CRFs), used primarily in domains like natural language processing (NLP) and computer vision. The paper introduces and evaluates a training approach known as "piecewise training," justifying it as a method that minimizes an upper bound on the log partition function. Through empirical evaluations, the authors compare piecewise training to the more traditional pseudolikelihood estimation and global training using belief propagation.

The crux of the problem tackled in the paper is that traditional maximum-likelihood estimation for these models is computationally infeasible due to the intractability of computing partition functions, especially in the presence of input-dependent features that appear in conditional models like CRFs. This issue becomes more pronounced in conditional training, where the partition function depends on both parameters and observed data, necessitating repeated inference during the training process.

Piecewise training circumvents these computational difficulties by independently training local undirected classifiers over each clique in the model before integrating the learned weights into a cohesive global model. The authors demonstrate that this approach is equivalent to maximizing a set of variational upper bounds on the log partition function, making it a sound method from a theoretical standpoint.

The authors conducted experiments on three NLP tasks: named-entity recognition using a linear-chain CRF, part-of-speech tagging and noun-phrase segmentation using a factorial CRF, and information extraction from seminar announcements using a skip-chain CRF. The results indicate that piecewise training generally yields more accurate models compared to pseudolikelihood and in some cases performs comparably to global training methods such as belief propagation.

Key findings from the experiments include:
- For the linear-chain CRF applied to named-entity recognition, piecewise training produced an overall F1 score of 91.2, surpassing both pseudolikelihood and per-edge pseudolikelihood while matching the accuracy of exact training.
- In the two-level factorial CRF for joint part-of-speech tagging and noun-phrase segmentation, the noun-phrase F1 score for piecewise training was 88.1, outperforming both versions of pseudolikelihood and belief propagation.
- In the skip-chain CRF model used for seminar announcements, both location and speaker extraction tasks showed improved scores with piecewise training over pseudolikelihood, although belief propagation showed slightly better results for the speaker task.

The theoretical justification and empirical evidence presented in the paper suggest that piecewise training offers a viable alternative to traditional methods, allowing for efficient parameter estimation in models where exact conditional training is impractical. The authors posit that piecewise training should be considered above pseudolikelihood in scenarios requiring local training. Nevertheless, challenges such as determining optimal conditions where piecewise training outperforms pseudolikelihood remain an area for further research.

Overall, the paper provides a thorough analysis and validation of piecewise training, proposing its suitability for large undirected models requiring efficient and accurate training procedures. Future research could explore its applications in more complex model structures and its adaptation in generative learning frameworks where the inherent assumptions differ.