Improving Password Guessing via Representation Learning (1910.04232v3)

Published 9 Oct 2019 in cs.CR

Abstract: Learning useful representations from unstructured data is one of the core challenges, as well as a driving force, of modern data-driven approaches. Deep learning has demonstrated the broad advantages of learning and harnessing such representations. In this paper, we introduce a deep generative model representation learning approach for password guessing. We show that an abstract password representation naturally offers compelling and versatile properties that can be used to open new directions in the extensively studied, and yet presently active, password guessing field. These properties can establish novel password generation techniques that are neither feasible nor practical with the existing probabilistic and non-probabilistic approaches. Based on these properties, we introduce:(1) A general framework for conditional password guessing that can generate passwords with arbitrary biases; and (2) an Expectation Maximization-inspired framework that can dynamically adapt the estimated password distribution to match the distribution of the attacked password set.

Citations (64)

View on Semantic Scholar

Summary

The paper introduces novel password guessing frameworks, CPG and DPG, that utilize representation learning from deep generative models like GANs and WAEs.
Conditional Password Guessing (CPG) exploits latent space locality using templates to guess passwords based on partial knowledge, outperforming traditional methods on biased sets.
Dynamic Password Guessing (DPG) reduces covariate shift by dynamically adapting its distribution using successfully guessed passwords, significantly improving performance against biased leaks.

Improving Password Guessing via Representation Learning

The paper "Improving Password Guessing via Representation Learning" explores a paradigm shift in password guessing approaches, leveraging representation learning techniques through deep generative models. The authors propose two innovative frameworks, Conditional Password Guessing (CPG) and Dynamic Password Guessing (DPG), based on the latent space properties inherent in these models.

Password Guessing and Representation Learning

The paper investigates the landscape of password guessing, where conventional methods like dictionary attacks, mangling rules, and probabilistic techniques such as Markov models and neural networks (e.g., Fast, Lean, and Accurate (FLA)) have been foundational. These approaches typically rely on estimating or directly modeling the password distribution at training time, often leading to a static model representation that may fall short in adapting to diverse password sets due to covariate shift—the discrepancy between the train set and real-world target distributions.

The authors introduce a shift from distribution estimation to representation learning using Generative Adversarial Networks (GANs) and Wasserstein Autoencoders (WAEs). These models enable abstract representations of passwords in a latent space which exhibit locality properties; passwords with similar structures or characteristics cluster together. This concept is harnessed to craft and enhance guessing strategies that are not feasible with traditional approaches.

Conditional Password Guessing (CPG)

CPG utilizes the strong locality principle to generate passwords under specific biases, denoted by templates containing wildcards. This technique allows targeting password classes based on partial knowledge—useful in scenarios like recovering forgotten passwords or performing advanced side-channel attacks. The approach exploits the geometry of the latent space learned by GANs or WAEs, enforcing semantic relevance among generated passwords based on proximity to pivot points (defined by templates). Experimental results demonstrate the superior ability of CPG to match passwords in biased test sets compared to traditional methods such as OMEN and PCFG.

Dynamic Password Guessing (DPG)

DPG focuses on reducing the covariate shift by dynamically adapting the modeled distribution during a guessing attack. It leverages weak locality, which refers to more general semantic clustering that captures underlying features of password distributions. As new passwords are cracked, DPG updates the latent distribution to reflect characteristics of guessed passwords, thus improving alignment with the attacked set. This is achieved by creating a conditional latent space distribution modeled as a mixture of Gaussians around successfully guessed latent points.

Evaluation and Implications

The paper presents substantial empirical evidence showing that DPG can significantly enhance guessing performance against password leaks with pronounced biases or unique distributions (e.g., Zomato and Youku). Furthermore, it elucidates how dynamic adaptation allows rapid prediction of high-probability passwords within the target set that might not be efficiently guessed by conventional static models.

The practical implications extend beyond threat scenarios. The frameworks proposed can boost compliance in organizational password policies or recovery systems. Moreover, the capacity of these methods to adapt unsupervised suggests potential applications in bootstrapping or fine-tuning password security measures across heterogeneous environments.

Future Work

The authors acknowledge the openness of representation learning in evolving deep generative frameworks, hinting at continuous improvement in password guessing accuracy and efficiency. The password locality properties and dynamic adaptation mechanisms offer opportunities for broader applications and integration with other security dimensions, such as multi-factor authentication systems.

In conclusion, this paper provides a compelling exploration of how representation learning can revolutionize password guessing, presenting frameworks that adeptly balance exploration and exploitation in highly variable password landscapes.

Related Papers

GitHub

GitHub - pasquini-dario/PLR: Official repository for "Improving Password Guessing via Representation Learning" (54 stars)