Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 150 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 31 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 105 tok/s Pro
Kimi K2 185 tok/s Pro
GPT OSS 120B 437 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Identifiability of Product of Experts Models (2310.09397v1)

Published 13 Oct 2023 in cs.LG, math.AG, math.ST, and stat.TH

Abstract: Product of experts (PoE) are layered networks in which the value at each node is an AND (or product) of the values (possibly negated) at its inputs. These were introduced as a neural network architecture that can efficiently learn to generate high-dimensional data which satisfy many low-dimensional constraints -- thereby allowing each individual expert to perform a simple task. PoEs have found a variety of applications in learning. We study the problem of identifiability of a product of experts model having a layer of binary latent variables, and a layer of binary observables that are iid conditional on the latents. The previous best upper bound on the number of observables needed to identify the model was exponential in the number of parameters. We show: (a) When the latents are uniformly distributed, the model is identifiable with a number of observables equal to the number of parameters (and hence best possible). (b) In the more general case of arbitrarily distributed latents, the model is identifiable for a number of observables that is still linear in the number of parameters (and within a factor of two of best-possible). The proofs rely on root interlacing phenomena for some special three-term recurrences.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. A learning algorithm for Boltzmann machines. Cognitive Science, 9:147–169, 1985.
  2. W. R. Blischke. Estimating the parameters of mixtures of binomial distributions. Journal of the American Statistical Association, 59(306):510–528, 1964. doi:10.1080/01621459.1964.10482176.
  3. Geometry of the restricted Boltzmann machine. In M. A. G. Viana and H. P. Wynn, editors, Algebraic Methods in Statistics and Probability II, volume 516 of Contemporary Mathematics, pages 135–153. 2010. doi:10.1090/conm/516.
  4. R. de Prony. Essai expérimentale et analytique. J. Écol. Polytech., 1(2):24–76, 1795.
  5. Algebraic factor analysis: tetrads, pentads and beyond. Probab. Theory Relat. Fields, 138:463–493, 2007. doi:10.1007/s00440-006-0033-2.
  6. Z. Fan and J. Li. Efficient algorithms for sparse moment problems without separation, 2022. arXiv:2207.13008.
  7. Y. Freund and D. Haussler. Unsupervised learning of distributions on binary vectors using two layer networks. In J. Moody, S. Hanson, and R. P. Lippmann, editors, Proc. 4th Int’l Conf. on Neural Information Processing Systems, pages 912–919. Morgan-Kaufmann, 1991. URL: https://proceedings.neurips.cc/paper/1991/file/33e8075e9970de0cfea955afd4644bb2-Paper.pdf.
  8. The sparse Hausdorff moment problem, with application to topic models, 2020. arXiv:2007.08101.
  9. Causal inference despite limited global confounding via mixture models. In M. van der Schaar, C. Zhang, and D. Janzing, editors, Proc. Second Conference on Causal Learning and Reasoning, volume 213 of Proceedings of Machine Learning Research, pages 574–601. PMLR, 11–14 Apr 2023. URL: https://proceedings.mlr.press/v213/gordon23a.html.
  10. Source identification for mixtures of product distributions. In Proc. 34th Ann. Conf. on Learning Theory - COLT, volume 134 of Proc. Machine Learning Research, pages 2193–2216. PMLR, 2021. URL: http://proceedings.mlr.press/v134/gordon21a.html.
  11. G. E. Hinton. Products of experts. In Proc. 9th Int’l Conf. on Artificial Neural Networks, volume 1, pages 1–6, 1999.
  12. G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1771–1800, 2002. doi:10.1162/089976602760128018.
  13. G. E. Hinton. A practical guide to training restricted Boltzmann machines, version 1 (UTML2010-003). Technical report, U Toronto, 2010.
  14. A fast learning algorithm for deep belief nets. Neural Comput., 18(7):1527–1554, 2006. doi:10.1162/neco.2006.18.7.1527.
  15. Multimodal conditional image synthesis with product-of-experts GANs. European Conference on Computer Vision, 13676:91–109, 2022.
  16. How many subpopulations is too many? Exponential lower bounds for inferring population histories. In L. Cowen, editor, Int’l Conf. on Research in Computational Molecular Biology, volume 11457 of Lecture Notes in Computer Science, pages 136–157. Springer, 2019. doi:10.1007/978-3-030-17083-7_9.
  17. DEXPERTS: Decoding-time controlled text generation with experts and anti-experts. European Conference on Computer Vision, 13676:91–109, 2022.
  18. Restricted Boltzmann machines are hard to approximately evaluate or simulate. In Proc. 27th Int’l Conf. on Machine Learning, ICML, page 703–710. Omnipress, 2010. URL: https://icml.cc/Conferences/2010/papers/115.pdf.
  19. On the representational efficiency of restricted Boltzmann machines. In Proc. Neurips, volume 26, pages 2877–2885. Curran Associates, 2013. URL: https://proceedings.neurips.cc/paper_files/paper/2013/file/7bb060764a818184ebb1cc0d43d382aa-Paper.pdf.
  20. G. Montúfar. Restricted Boltzmann machines: Introduction and review, 2018. arXiv:1806.07066.
  21. G. Montúfar and J. Morton. Discrete restricted Boltzmann machines. J. Machine Learning Research, 16(21):653–672, 2015. URL: http://jmlr.org/papers/v16/montufar15a.html.
  22. G. Montúfar and J. Morton. Dimension of marginals of Kronecker product models. SIAM J. Appl. Algebra Geometry, 1(1):126–151, 2017. doi:10.1137/16M1077489.
  23. A. Oneto and N. Vannieuwenhoven. Hadamard-Hitchcock decompositions: identifiability and computation, 2023. arXiv:2308.06597.
  24. N. Le Roux and Y. Bengio. Representational power of restricted Boltzmann machines and deep belief networks. Neural Computation, 20(6):1631–1649, 2008.
  25. A. Seigal and G. Montúfar. Mixtures and products in two graphical models. J. Algebr. Stat., 9(1):1–20, 2018.
  26. P. Smolensky. Information processing in dynamical systems: Foundations of harmony theory. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations, chapter 6, pages 194–281. MIT Press, Cambridge, MA, USA, 1986.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.