Composer Vector: Style-steering Symbolic Music Generation in a Latent Space

Published 3 Apr 2026 in cs.SD and cs.AI | (2604.03333v1)

Abstract: Symbolic music generation has made significant progress, yet achieving fine-grained and flexible control over composer style remains challenging. Existing training-based methods for composer style conditioning depend on large labeled datasets. Besides, these methods typically support only single-composer generation at a time, limiting their applicability to more creative or blended scenarios. In this work, we propose Composer Vector, an inference-time steering method that operates directly in the model's latent space to control composer style without retraining. Through experiments on multiple symbolic music generation models, we show that Composer Vector effectively guides generations toward target composer styles, enabling smooth and interpretable control through a continuous steering coefficient. It also enables seamless fusion of multiple styles within a unified latent space framework. Overall, our work demonstrates that simple latent space steering provides a practical and general mechanism for controllable symbolic music generation, enabling more flexible and interactive creative workflows. Code and Demo are available here: https://github.com/JiangXunyi/Composer-Vector and https://jiangxunyi.github.io/composervector.github.io/

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces an inference-time method that injects learned latent composer style vectors into transformer-based models to steer music generation without retraining.
It demonstrates robust control where the steering coefficient correlates linearly with classifier probabilities, validated by metrics like CLAP and CLaMP.
The approach enables real-time multi-composer style fusion and suppression, paving the way for interactive, creative generative music systems.

Composer Vector: Fine-Grained Latent Space Style Steering for Symbolic Music Generation

Introduction

The persistent challenge in symbolic music generation is achieving fine-grained, compositional, and inference-time controllability over high-level style attributes, particularly those associated with specific composers. "Composer Vector: Style-steering Symbolic Music Generation in a Latent Space" (2604.03333) presents an activation-based inference-time method for controlling composer style in symbolic music generation models without retraining or additional labeled data. The method, Composer Vector, encapsulates the stylistic direction of a composer within the model’s internal latent space and injects this information during generation, enabling steerable and composable style control.

Figure 1: The Composer Vector method operates by extracting latent directions capturing composer styles and injecting them into the residual stream of transformer-based music LMs at inference time for continuous control, fusion, and suppression of stylistic attributes.

Latent Representation of Composer Style

Building on the observation that transformer LLMs encode disentangled semantic features in their deep representation layers, the paper hypothesizes and empirically validates that classical composer styles are manifested as structured, linearly separable directions in the latent space of symbolic music models (e.g., NotaGen, ChatMusician).

Layer-wise evaluation via linear probing and unsupervised clustering (using kNN purity, Davies-Bouldin index, and t-SNE visualizations) confirms that deep layers explicitly localize composer identity. Notably, t-SNE visualizations demonstrate that embeddings from the highest layers form distinct, compact clusters by composer, reinforcing the interpretability and manipulability of latent composer representations.

Figure 2: t-SNE visualization reveals that deep-layer embeddings of symbolic pieces cluster according to composer; canonical composers (e.g., Bach, Liszt) form well-separated, compact manifolds.

Figure 3: Layer-wise evaluation with linear probing and clustering metrics identifies late transformer layers as optimal for composer style extraction, peaking at 94% probe accuracy in NotaGen.

Methodology: Construction and Application of Composer Vectors

Composer Vectors are computed as mean latent representations of canonical pieces (prompted with composer descriptors and ABC notation) at a selected layer with maximal style separability. For a desired steering target, the user injects the corresponding style vector with arbitrary scalar coefficient $\alpha$ during the decoding process. The method supports:

Continuous intensity modulation: Varying $\alpha$ provides fine-grained control over stylistic influence, as classifier probability for the composer increases monotonically with $\alpha$ .
Linear style fusion and suppression: Multiple composer vectors can be linearly combined with positive and negative weights to blend or suppress stylistic identities in generated music.

Steering is applied only to musically relevant tokens, preserving score format integrity.

Experimental Evaluation: Effectiveness and Expressivity

Single-Composer Steering and Style Control

Similarity-based evaluations using CLAP and CLaMP metrics quantitatively confirm that Composer Vector steering consistently increases the similarity between generated music and the latent style distribution of the target composer, across both NotaGen and ChatMusician. Classification-based evaluation—using a CLaMP3-trained classifier—demonstrates a systematic increase in prediction probability for the steered composer, consistently exceeding baseline and prompt-driven conditioning. In challenging cases, Composer Vectors dominate prompt context, with prediction probabilities exceeding 50% even with mismatched prompts.

Figure 4: For ChatMusician, latent similarity to target composer—measured by CLAP and CLaMP—improves substantially after steering, indicating controllable trajectory in style space.

Figure 5: Steering improvement heatmap for NotaGen, highlighting the probability increments for the target composer (green), with over 97% of cases achieving an increased classifier alignment.

Scalar Control of Stylistic Intensity

The experiments reveal a near-linear, positive correlation between steering coefficient $\alpha$ and classifier probability for the target composer, supporting the interpretability and continuity of control. The effect is robust across stylistically divergent prompts and models, although transfer strength exhibits composer-composer sensitivity, with larger gains observed when prompt and steering composers are stylistically proximate.

Figure 6: Regression of steering coefficient $\alpha$ versus classifier probability across Beethoven, Chopin, Rachmaninoff; curves confirm monotonic, amplifiable control of style intensity.

Multi-Composer Style Fusion

By linearly interpolating between two composers' vectors, the method demonstrates smooth, interpretable blending of stylistic attributes. Classifier probability for each composer reacts proportionally to the respective coefficient, and regression slopes indicate high linear independence in their stylistic manifolds. Localized sample-wise maxima correspond systematically to dominant steering coefficients.

Figure 7: Fusion of two composer vectors results in monotonic and opposing changes in classifier probabilities, illustrating controllable stylistic interpolation in the latent space.

Implications and Future Perspectives

Practically, Composer Vector enables training-free, real-time artistic control over symbolic music generation, supporting interactive workflows and creative exploration. Theoretically, the work reframes composer style control as a linear manipulation task in the high-dimensional activation space, aligning with recent advancements in mechanistic interpretability and representation engineering. The demonstrated linearity and composition properties suggest that high-level musical features (analogous to LLM conceptual steering directions) are accessible in music LMs, independent of explicit label conditioning.

This opens promising directions including real-time, user-interactive generative systems, zero-shot domain adaptation via activation editing, and the systematic study of style disentanglement and feature universality in music LMs. Potential limitations include the dependency on the training data’s style coverage and the linearity assumption for arbitrary feature directions; future work may explore nonlinear fusion, more granular attribute vectors (e.g., texture, harmony), and downstream applications in music IR and performance analysis.

Conclusion

Composer Vector establishes a robust mechanism for inference-time, latent space steering of composer style in symbolic music generation models, requiring no retraining or additional supervision. The method enables fine-grained, interpretable, and composable control of high-level stylistic attributes across contemporary symbolic music LMs, and demonstrates strong metric-based and classifier-based alignment improvements over baseline conditioning. These findings substantiate the viability of latent activation engineering for flexible, creative AI control in music, paralleling and complementing analogous trends in natural language generation.