Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization (2505.24859v1)

Published 30 May 2025 in cs.LG and cs.CL

Abstract: Steering vectors are a lightweight method for controlling text properties by adding a learned bias to LLM activations at inference time. So far, steering vectors have predominantly been evaluated in multiple-choice settings, while their effectiveness in free-form generation tasks remains understudied. Moving "Beyond Multiple Choice," we thoroughly evaluate the effectiveness of steering vectors in adaptively controlling topical focus, sentiment, toxicity, and readability in abstractive summaries of the NEWTS dataset. We find that steering effectively controls the targeted summary properties, but high steering strengths consistently degrade both intrinsic and extrinsic text quality. Compared to steering, prompting offers weaker control, while preserving text quality. Combining steering and prompting yields the strongest control over text properties and offers the most favorable efficacy-quality trade-off at moderate steering strengths. Our results underscore the practical trade-off between control strength and text quality preservation when applying steering vectors to free-form generation tasks.

Summary

The paper evaluates steering vectors as a method for controlling text properties like topical focus, sentiment, and readability in large language model-generated free-form summaries.
Experiments show that while steering vectors can effectively control text properties, stronger steering leads to significant degradation in text quality, impacting metrics like perplexity and ROUGE scores.
A comparative analysis indicates prompt engineering offers weaker control but better quality preservation, while a hybrid approach combining steering vectors and prompting provides the most balanced control and quality trade-off.

Evaluating Steering Vectors for Adaptive Free-Form Summarization

The paper "Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization" investigates the use of steering vectors to manipulate the text properties generated by LLMs beyond their traditional application in multiple-choice scenarios. The paper thoroughly evaluates steering vectors for adaptive summarization tasks—specifically focusing on controlling topical focus, sentiment, toxicity, and readability of summaries produced from the NEWTS dataset.

Methodological Framework

At its core, steering vectors represent an activation engineering technique that introduces a learned bias into LLM activations during inference, thus influencing text generation to align with desired properties. The paper implements Contrastive Activation Addition (CAA), a particular steering method that involves calculating steering vectors based on the mean activation differences from paired training data. These vectors are then applied at specified layers of the LLM during text generation with varying steering strengths to assess their impact on the generated summaries.

Experiments and Results

The experiments reveal several key insights:

Efficacy in Controlling Text Properties: The paper demonstrates that steering vectors can effectively guide the generated text properties—achieving significant control over topical focus, sentiment, and readability. It finds that while topical focus can be modulated, toxicity is much harder to introduce or amplify in models that underwent safety and alignment post-training.
Quality Trade-offs: One of the central findings is the pronounced trade-off between control strength and text quality preservation. High steering magnitudes consistently lead to text degradation, impacting intrinsic metrics like perplexity and bigram repetition, and extrinsic metrics like ROUGE and BERTScore. The trade-off requires careful calibration for practical applications where text quality is paramount.
Comparative Analysis with Prompt Engineering: Prompt engineering, when compared against steering vectors, provided weaker control but better quality preservation. Prompting retained intrinsic and extrinsic text quality more effectively, highlighting a viable alternative for scenarios where strong attribute control isn't required.
Hybrid Approach: Combining steering vectors with prompt engineering yielded robust control over text properties at moderate steering strengths—achieving the most balanced quality-efficacy trade-off across experiments.

Implications and Future Directions

The implications of this research are notable for its contribution to controlled text generation in NLP. Practical applications could range from personalized content delivery to tailored educational material, and adaptive summarization catering to specific audience preferences.

Future work might explore strategies for multi-attribute steering, developing methods that can simultaneously manipulate several text dimensions without incurring substantial quality degradation. Additionally, exploring dynamic steering strengths or more generalized activation engineering techniques could enhance the reliability and applicability of steering vectors across diverse NLP tasks.