- The paper introduces agentic parameter reasoning that emulates professional colorists to output interpretable ASC-CDL parameters and globally consistent 3D LUTs.
- It employs a dual-stream perception network and a hierarchical Tree of Thoughts search to navigate complex, context-aware color adjustments.
- Iterative reflection based on user feedback ensures targeted refinements, preserving temporal consistency and achieving cinematographic quality.
LumiVideo: Agentic Parameter Reasoning for Automated Video Color Grading
Introduction and Motivation
LumiVideo introduces a paradigm shift in automated video color grading by recasting the process as an agentic, parameter-driven workflow. Traditional generative approaches, including diffusion-based and image-to-image translation models, are inherently ill-suited for professional color grading; they treat grading as a direct pixel manipulation task, resulting in opaqueness, lack of interpretability, temporal inconsistency, and incompatibility with standard non-linear editing (NLE) pipelines. LumiVideo addresses these limitations by emulating the cognitive workflow of professional colorists, comprising Perception, Reasoning, Execution, and Reflection, and outputs mathematically interpretable, industry-standard ASC-CDL parameters and 3D LUTs.
System Architecture and Workflow
LumiVideo's architecture is composed of four distinct agentic stages:
- Perception: Extracts both the physical properties and semantic context from log-encoded footage using a dual-stream approach. The physical stream applies deterministic color-space transforms for objective exposure profiling, while the semantic stream leverages VLMs to parse scene content and protected tone regions.
- Reasoning: Utilizes an LLM-based agent, synergized with a domain-specific Retrieval-Augmented Generation (RAG) database, and explores the non-linear color parameter space through a Tree of Thoughts (ToT) search. This stage navigates ASC-CDL parameters with both learned cinematic heuristics and explicit scene constraints.
- Execution: Translates optimized parameters into deterministic ASC-CDL configurations and compiles a globally consistent 3D LUT. Key refinements, such as adaptive lift and highlight roll-off, ensure artifact-free rendering in diverse dynamic range scenes.
- Reflection: Enables iterative, language-driven refinement, allowing selective, state-aware manipulation of grading parameters based on user feedback, with structural locking of irrelevant parameters for stable, predictable convergence.
Figure 1: Architecture of LumiVideo, illustrating the agentic loop of Perception, Reasoning, Execution, and Reflection for iterative grading refinement.
Structured Reasoning and Iterative Control
The core innovation in LumiVideo's Reasoning module lies in hierarchical parametric exploration. The Tree of Thoughts search, with RAG anchoring, allows systematic expansion and pruning over candidate grading strategies. Each candidate node integrates RAG-based cinematic heuristics, is instantiated with explicit ASC-CDL parameters, and is quantitatively evaluated by a VLM-based critic for both cinematic intent and preservation of protected tones. This explicit search and evaluation loop circumvents the limitations of single-pass CoT prompting and enables robust optimization under complex scene semantics.
The Reflection stage operationalizes fine-grained, human-in-the-loop iterative control. User directives in natural language are parsed to both the relevant grading parameter and the intended magnitude/intensity of the adjustment. Uninvolved parameters are structurally locked, ensuring only targeted changes occur with each iteration, and facilitating fast convergence to a desired look with minimal trial-and-error.
Figure 2: Reflection-driven iteration: a user directive leads to targeted refinement of ASC-CDL parameters while maintaining consistency across all other aspects of the grade.
Benchmarking: LumiGrade Dataset
Progress in automated color grading has been historically impeded by the lack of authentic, log-encoded benchmarks. LumiGrade fills this gap by providing more than 100 clips from multiple real-world camera log formats, along with detailed metadata and expert-graded references. Each reference includes both ASC-CDL parameter exports and rendered Rec.709 outputs, enabling rigorous evaluation in both pixel and parameter spaces.
Experimental Results
Qualitative Outcomes
LumiVideo demonstrates domain-robust and cinematographically consistent results across varied scenes and camera formats. Pixel-based generative models (e.g., GPT-5.3, Gemini 3.1) often misinterpret log-encoded input, introducing severe artifacts and semantic inconsistencies. In contrast, LumiVideo's output:
Quantitative Analysis
LumiVideo achieves the highest scores across all objective and subjective grading metrics, surpassing generalist and LUT-based methods. Notably, it matches or exceeds the performance of human experts on technical, aesthetic, and grading-specific (LLM-Judge) criteria. The method's industry-compliant outputs are rated highly usable by professional colorists, confirming the practical value of its parameter-centric design.
Ablative Insights
Ablation studies validate the necessity of each system component:
- Removing Tree of Thoughts markedly degrades both creative and technical quality.
- Excluding RAG heuristics attenuates cinematic alignment and aesthetic appeal.
- Omitting protected tone constraints compromises critical hue integrity, especially for skin and important environmental tones.
- Single-shot (non-reflective) operation is consistently weaker versus iterative refinement.
Implications and Future Directions
LumiVideo establishes a new standard for automated video color grading, synthesizing agentic reasoning with semantically grounded, interpretable parameters. Its framing of grading as an interactive, iterative decision process paves the way for AI-powered tools that integrate seamlessly within existing production pipelines, allowing both automation and fine-grained creative control.
Theoretically, this work illustrates the computational tractability of operating over continuous, domain-standard parameter spaces with agent-based models, rather than ill-conditioned pixel spaces. Practically, it sets a precedent for the migration of high-level creative intent from human experts to intelligent agents, directly interfacing with professional editing platforms.
A primary current limitation is the absence of spatially-varying parameterization; the method applies global LUT adjustments. Extending to local, subject-aware grading through joint action over region masks or secondary qualifiers represents a logical trajectory for future exploration and could further close the gap with expert colorists. In addition, further research into integrating multimodal intent—combining narrative, audio, or shot metadata—could further enhance creative alignment.
Conclusion
LumiVideo advances automated video color grading by aligning intelligent agentic reasoning with industry standards. It achieves both high quantitative and qualitative performance, operationalizes iterative user-steerable refinement, and is grounded in a robust benchmark representative of real-world professional challenges. The method's confluence of transparency, control, and technical rigor situates it as an instructive model for broader visual AI applications that demand interpretability and creative autonomy.