Pymc-learn: Practical Probabilistic Machine Learning in Python (1811.00542v1)

Published 31 Oct 2018 in stat.ML and cs.LG

Abstract: $\textit{Pymc-learn}$ is a Python package providing a variety of state-of-the-art probabilistic models for supervised and unsupervised machine learning. It is inspired by $\textit{scikit-learn}$ and focuses on bringing probabilistic machine learning to non-specialists. It uses a general-purpose high-level language that mimics $\textit{scikit-learn}$. Emphasis is put on ease of use, productivity, flexibility, performance, documentation, and an API consistent with $\textit{scikit-learn}$. It depends on $\textit{scikit-learn}$ and $\textit{pymc3}$ and is distributed under the new BSD-3 license, encouraging its use in both academia and industry. Source code, binaries, and documentation are available on http://github.com/pymc-learn/pymc-learn.

Citations (4)

View on Semantic Scholar

Summary

The paper introduces Pymc-learn, a user-friendly Python package that bridges scikit-learn’s interface with PyMC3’s advanced probabilistic models.
It demonstrates how the package enables rapid prototyping by simplifying model training, evaluation, and prediction for non-specialist users.
The paper highlights the use of advanced inference techniques like NUTS and ADVI to ensure scalability and flexibility in complex, high-dimensional settings.

An Examination of Pymc-learn: Practical Probabilistic Machine Learning in Python

The paper entitled "Pymc-learn: Practical Probabilistic Machine Learning in Python" introduces Pymc-learn, a Python package that integrates state-of-the-art probabilistic models for both supervised and unsupervised learning. Designed to empower non-specialists, this package draws inspiration from the widely-used scikit-learn library and is built upon PyMC3. The package aims to bridge the gap for users who require probabilistic modeling tools but may not possess specialized expertise in probabilistic programming or inference methodologies.

Context and Motivation

Probabilistic machine learning provides a principled approach to modeling uncertainty, which is crucial in a variety of domains ranging from natural sciences to engineering and the arts. There is a significant demand for transparent models capable of navigating uncertainty effectively. Probabilistic programming languages offer a framework to create highly expressive models but often require substantial domain knowledge in probability theory and inference methods, posing challenges for broader community adoption.

Pymc-learn seeks to alleviate these challenges by providing an accessible API that mimics the user-friendly interface of scikit-learn while leveraging PyMC3's robust probabilistic modeling features. The package emphasizes ease of use, productivity, and flexibility without sacrificing performance, thus facilitating the adoption of probabilistic models among non-specialist users across various fields.

Key Features and Design Principles

To enhance user experience, Pymc-learn follows several design principles:

Ease of Use: By emulating the scikit-learn API, Pymc-learn enables users familiar with scikit-learn to transition seamlessly, using a familiar syntax for model instantiation, training, and evaluation.
Productivity: The consistency of the API reduces the need for users to rewrite code from scratch, enhancing productivity and allowing rapid prototyping and experimentation.
Flexibility: Integration with PyMC3 provides users with the flexibility to build complex models that incorporate domain knowledge without the need for specialized probabilistic programming skills.
Performance: Pymc-learn employs advanced probabilistic inference algorithms such as the No U-turn Sampler (NUTS) and Automatic Differentiation Variational Inference (ADVI). The latter facilitates scalability in high-dimensional and complex models via GPU-accelerated computation and mini-batches.

The package is open-source and accessible, distributed under the BSD-3 license, with ongoing community-driven development and comprehensive documentation available online.

Illustrative Utility and Example

The paper provides an illustrative exploration of Pymc-learn's utility through practical examples, demonstrating its closely aligned functionality with scikit-learn. Specifically, it showcases model training using algorithms like Gaussian Process Regression, highlighting the similarities in syntax and workflow. The package supports essential operations, such as model fitting, scoring, prediction, saving, and loading, which are critical for integrating probabilistic models into existing workflows.

Implications and Future Directions

The introduction of Pymc-learn represents a strategic development in making probabilistic modeling accessible to a wider audience. The package's design prioritizes reducing the learning curve associated with probabilistic programming while retaining the flexibility and expressiveness required in advanced modeling tasks. The potential implications extend across various domains where uncertainty modeling is vital.

Looking ahead, the paper mentions planned extensions to the package, including the incorporation of additional probabilistic models such as Hidden Markov Models and Bayesian Neural Networks. This evolution could broaden the applicability of Pymc-learn, providing users with an even more comprehensive toolkit for probabilistic machine learning.

Conclusion

In summary, Pymc-learn serves as an important tool in democratizing access to advanced probabilistic machine learning techniques. By providing an interface that mirrors the simplicity of scikit-learn and harnessing the power of PyMC3, the package offers both non-specialists and experienced practitioners a resource to effectively engage with probabilistic modeling challenges. As the package evolves, it may further solidify its position as a valuable resource in the landscape of probabilistic programming and machine learning.

PDF Markdown

Related Papers

GitHub

GitHub - pymc-learn/pymc-learn: pymc-learn: Practical probabilistic machine learning in Python (223 stars)

Tweets

https://twitter.com/russpoldrack/status/1059489159096586240

https://twitter.com/pythontrending/status/1059662310237704192

https://twitter.com/EricSchles/status/1093197902950944769

https://twitter.com/twiecki/status/1060862415376662528

https://twitter.com/ds_vault/status/1059729070148255744