- The paper introduces an inference-time method that injects learned latent composer style vectors into transformer-based models to steer music generation without retraining.
- It demonstrates robust control where the steering coefficient correlates linearly with classifier probabilities, validated by metrics like CLAP and CLaMP.
- The approach enables real-time multi-composer style fusion and suppression, paving the way for interactive, creative generative music systems.
Composer Vector: Fine-Grained Latent Space Style Steering for Symbolic Music Generation
Introduction
The persistent challenge in symbolic music generation is achieving fine-grained, compositional, and inference-time controllability over high-level style attributes, particularly those associated with specific composers. "Composer Vector: Style-steering Symbolic Music Generation in a Latent Space" (2604.03333) presents an activation-based inference-time method for controlling composer style in symbolic music generation models without retraining or additional labeled data. The method, Composer Vector, encapsulates the stylistic direction of a composer within the modelโs internal latent space and injects this information during generation, enabling steerable and composable style control.
Figure 1: The Composer Vector method operates by extracting latent directions capturing composer styles and injecting them into the residual stream of transformer-based music LMs at inference time for continuous control, fusion, and suppression of stylistic attributes.
Latent Representation of Composer Style
Building on the observation that transformer LLMs encode disentangled semantic features in their deep representation layers, the paper hypothesizes and empirically validates that classical composer styles are manifested as structured, linearly separable directions in the latent space of symbolic music models (e.g., NotaGen, ChatMusician).
Layer-wise evaluation via linear probing and unsupervised clustering (using kNN purity, Davies-Bouldin index, and t-SNE visualizations) confirms that deep layers explicitly localize composer identity. Notably, t-SNE visualizations demonstrate that embeddings from the highest layers form distinct, compact clusters by composer, reinforcing the interpretability and manipulability of latent composer representations.
Figure 2: t-SNE visualization reveals that deep-layer embeddings of symbolic pieces cluster according to composer; canonical composers (e.g., Bach, Liszt) form well-separated, compact manifolds.
Figure 3: Layer-wise evaluation with linear probing and clustering metrics identifies late transformer layers as optimal for composer style extraction, peaking at 94% probe accuracy in NotaGen.
Methodology: Construction and Application of Composer Vectors
Composer Vectors are computed as mean latent representations of canonical pieces (prompted with composer descriptors and ABC notation) at a selected layer with maximal style separability. For a desired steering target, the user injects the corresponding style vector with arbitrary scalar coefficient ฮฑ during the decoding process. The method supports:
- Continuous intensity modulation: Varying ฮฑ provides fine-grained control over stylistic influence, as classifier probability for the composer increases monotonically with ฮฑ.
- Linear style fusion and suppression: Multiple composer vectors can be linearly combined with positive and negative weights to blend or suppress stylistic identities in generated music.
Steering is applied only to musically relevant tokens, preserving score format integrity.
Experimental Evaluation: Effectiveness and Expressivity
Single-Composer Steering and Style Control
Similarity-based evaluations using CLAP and CLaMP metrics quantitatively confirm that Composer Vector steering consistently increases the similarity between generated music and the latent style distribution of the target composer, across both NotaGen and ChatMusician. Classification-based evaluationโusing a CLaMP3-trained classifierโdemonstrates a systematic increase in prediction probability for the steered composer, consistently exceeding baseline and prompt-driven conditioning. In challenging cases, Composer Vectors dominate prompt context, with prediction probabilities exceeding 50% even with mismatched prompts.
Figure 4: For ChatMusician, latent similarity to target composerโmeasured by CLAP and CLaMPโimproves substantially after steering, indicating controllable trajectory in style space.
Figure 5: Steering improvement heatmap for NotaGen, highlighting the probability increments for the target composer (green), with over 97% of cases achieving an increased classifier alignment.
Scalar Control of Stylistic Intensity
The experiments reveal a near-linear, positive correlation between steering coefficient ฮฑ and classifier probability for the target composer, supporting the interpretability and continuity of control. The effect is robust across stylistically divergent prompts and models, although transfer strength exhibits composer-composer sensitivity, with larger gains observed when prompt and steering composers are stylistically proximate.
Figure 6: Regression of steering coefficient ฮฑ versus classifier probability across Beethoven, Chopin, Rachmaninoff; curves confirm monotonic, amplifiable control of style intensity.
Multi-Composer Style Fusion
By linearly interpolating between two composers' vectors, the method demonstrates smooth, interpretable blending of stylistic attributes. Classifier probability for each composer reacts proportionally to the respective coefficient, and regression slopes indicate high linear independence in their stylistic manifolds. Localized sample-wise maxima correspond systematically to dominant steering coefficients.
Figure 7: Fusion of two composer vectors results in monotonic and opposing changes in classifier probabilities, illustrating controllable stylistic interpolation in the latent space.
Implications and Future Perspectives
Practically, Composer Vector enables training-free, real-time artistic control over symbolic music generation, supporting interactive workflows and creative exploration. Theoretically, the work reframes composer style control as a linear manipulation task in the high-dimensional activation space, aligning with recent advancements in mechanistic interpretability and representation engineering. The demonstrated linearity and composition properties suggest that high-level musical features (analogous to LLM conceptual steering directions) are accessible in music LMs, independent of explicit label conditioning.
This opens promising directions including real-time, user-interactive generative systems, zero-shot domain adaptation via activation editing, and the systematic study of style disentanglement and feature universality in music LMs. Potential limitations include the dependency on the training dataโs style coverage and the linearity assumption for arbitrary feature directions; future work may explore nonlinear fusion, more granular attribute vectors (e.g., texture, harmony), and downstream applications in music IR and performance analysis.
Conclusion
Composer Vector establishes a robust mechanism for inference-time, latent space steering of composer style in symbolic music generation models, requiring no retraining or additional supervision. The method enables fine-grained, interpretable, and composable control of high-level stylistic attributes across contemporary symbolic music LMs, and demonstrates strong metric-based and classifier-based alignment improvements over baseline conditioning. These findings substantiate the viability of latent activation engineering for flexible, creative AI control in music, paralleling and complementing analogous trends in natural language generation.