Papers
Topics
Authors
Recent
Search
2000 character limit reached

Introducing Orthogonal Constraint in Structural Probes

Published 30 Dec 2020 in cs.CL | (2012.15228v2)

Abstract: With the recent success of pre-trained models in NLP, a significant focus was put on interpreting their representations. One of the most prominent approaches is structural probing (Hewitt and Manning, 2019), where a linear projection of word embeddings is performed in order to approximate the topology of dependency structures. In this work, we introduce a new type of structural probing, where the linear projection is decomposed into 1. isomorphic space rotation; 2. linear scaling that identifies and scales the most relevant dimensions. In addition to syntactic dependency, we evaluate our method on novel tasks (lexical hypernymy and position in a sentence). We jointly train the probes for multiple tasks and experimentally show that lexical and syntactic information is separated in the representations. Moreover, the orthogonal constraint makes the Structural Probes less vulnerable to memorization.

Citations (15)

Summary

  • The paper introduces an orthogonal constraint that decomposes structural probes into isomorphic rotation and linear scaling.
  • Multitask training with shared orthogonal transformations reduces overfitting and effectively captures distinct linguistic features.
  • Evaluation on BERT layers confirms competitive performance, reduced memorization, and enhanced interpretability of language structures.

Introducing Orthogonal Constraint in Structural Probes

The paper "Introducing Orthogonal Constraint in Structural Probes" presents a novel form of structural probing for NLP models that incorporates an orthogonal constraint. The research extends existing methodologies by introducing a decomposition approach that enhances interpretation and generalization in probing tasks.

Background

Structural Probing

Structural probes are used to analyze the latent space of pre-trained LLMs (PLMs) like BERT and ELMo by applying a linear transformation to the embeddings and optimizing for linguistic tasks such as dependency distance and depth. This study builds on the work of Hewitt and Manning (2019), who introduced structural probes for syntactic analysis by modeling syntactic distances and depths using linear transformations and regression objectives.

Orthogonal Probes

The novel contribution of this research lies in replacing the linear transformation with an orthogonal transformation combined with a scaling vector, which allows decomposition of the probe into two distinct stages: isomorphic space rotation and linear scaling. This modification provides a more robust framework that can jointly train for multiple linguistic structures (e.g., syntactic dependencies, lexical hypernymy) with reduced memorization effects.

Methodology

Configurations

The orthogonal transformation is designed to maintain the structure of the original embedding space. The matrix VV is orthogonalized, ensuring that transformations retain spatial relationships without distortion, typical of non-orthogonal transformations.

Multitask Training

By sharing the orthogonal transformation across tasks and varying the task-specific scaling vector, the model can perform multi-task probing efficiently. This configuration reduces parameter count and prevents overfitting, leading to greater selectivity in capturing relevant linguistic features.

Regularization Techniques

The research employs double soft orthogonality (DSO) to enforce orthogonality constraints and sparsity regularization to reduce the dimensionality of the scaling vector effectively. These mechanisms help identify dimensions critical for encoding specific linguistic structures. Figure 1

Figure 1: Values of orthogonality penalty during joint training of Orthogonal Structural Probe on top of layers: 3 (green), 7 (yellow), 16 (gray), 24 (blue). Optimization steps on the x-axis.

Evaluation

The Orthogonal Structural Probe was evaluated across different layers of a pre-trained BERT model. The introduction of an orthogonal constraint resulted in similar or superior performance compared to traditional structural probes, particularly in tasks evaluating syntactic and lexical structures.

Dimensionality Analysis

The embedding space's dimensionality was dynamically adjusted during training via sparsity constraints, resulting in low-rank subspace projections that maintained high model performance. This dimensionality reduction highlights the capture of focal linguistic information in fewer dimensions, enhancing interpretability and efficiency. Figure 2

Figure 2: Values of sparsity penalty during separate training of Orthogonal Structural Probe s~with lambda=0.05. Objectives from highest to lowest value: lexical distance (yellow), positional distance (green), dependency distance (gray), positional depth (violet), lexical depth (magenta), dependency depth (blue), random depth (orange). Optimization steps on the x-axis.

Results

The empirical results demonstrated that Orthogonal Structural Probes are less susceptible to overfitting due to their reduced memorization of random structures, as indicated by lower correlations in control tasks. Multitask training settings maintained competitive performance metrics and revealed clear separation in encoding different types of linguistic information, such as syntax and lexis, supporting the probes' efficacy in complex language understanding scenarios.

Conclusion

This paper introduces an innovative structural probing technique that leverages orthogonal transformations to separate and analyze linguistic information in PLMs effectively. The work presents a significant advancement in probing techniques by enabling multitask learning with enhanced interpretability and reduced parameter utilization. Future research may explore applying these techniques to other linguistic domains and PLMs, further diversifying their applicability and impact in NLP.

Overall, this research highlights the potential for orthogonal constraints in elucidating the relationships among different linguistic embeddings, offering promising directions for both theoretical exploration and practical applications in language technology.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.