LumiVideo: An Intelligent Agentic System for Video Color Grading

Published 2 Apr 2026 in cs.CV and cs.AI | (2604.02409v1)

Abstract: Video color grading is a critical post-production process that transforms flat, log-encoded raw footage into emotionally resonant cinematic visuals. Existing automated methods act as static, black-box executors that directly output edited pixels, lacking both interpretability and the iterative control required by professionals. We introduce LumiVideo, an agentic system that mimics the cognitive workflow of professional colorists through four stages: Perception, Reasoning, Execution, and Reflection. Given only raw log video, LumiVideo autonomously produces a cinematic base grade by analyzing the scene's physical lighting and semantic content. Its Reasoning engine synergizes an LLM's internalized cinematic knowledge with a Retrieval-Augmented Generation (RAG) framework via a Tree of Thoughts (ToT) search to navigate the non-linear color parameter space. Rather than generating pixels, the system compiles the deduced parameters into industry-standard ASC-CDL configurations and a globally consistent 3D LUT, analytically guaranteeing temporal consistency. An optional Reflection loop then allows creators to refine the result via natural language feedback. We further introduce LumiGrade, the first log-encoded video benchmark for evaluating automated grading. Experiments show that LumiVideo approaches human expert quality in fully automatic mode while enabling precise iterative control when directed.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces agentic parameter reasoning that emulates professional colorists to output interpretable ASC-CDL parameters and globally consistent 3D LUTs.
It employs a dual-stream perception network and a hierarchical Tree of Thoughts search to navigate complex, context-aware color adjustments.
Iterative reflection based on user feedback ensures targeted refinements, preserving temporal consistency and achieving cinematographic quality.

LumiVideo: Agentic Parameter Reasoning for Automated Video Color Grading

Introduction and Motivation

LumiVideo introduces a paradigm shift in automated video color grading by recasting the process as an agentic, parameter-driven workflow. Traditional generative approaches, including diffusion-based and image-to-image translation models, are inherently ill-suited for professional color grading; they treat grading as a direct pixel manipulation task, resulting in opaqueness, lack of interpretability, temporal inconsistency, and incompatibility with standard non-linear editing (NLE) pipelines. LumiVideo addresses these limitations by emulating the cognitive workflow of professional colorists, comprising Perception, Reasoning, Execution, and Reflection, and outputs mathematically interpretable, industry-standard ASC-CDL parameters and 3D LUTs.

System Architecture and Workflow

LumiVideo's architecture is composed of four distinct agentic stages:

Perception: Extracts both the physical properties and semantic context from log-encoded footage using a dual-stream approach. The physical stream applies deterministic color-space transforms for objective exposure profiling, while the semantic stream leverages VLMs to parse scene content and protected tone regions.
Reasoning: Utilizes an LLM-based agent, synergized with a domain-specific Retrieval-Augmented Generation (RAG) database, and explores the non-linear color parameter space through a Tree of Thoughts (ToT) search. This stage navigates ASC-CDL parameters with both learned cinematic heuristics and explicit scene constraints.
Execution: Translates optimized parameters into deterministic ASC-CDL configurations and compiles a globally consistent 3D LUT. Key refinements, such as adaptive lift and highlight roll-off, ensure artifact-free rendering in diverse dynamic range scenes.
Reflection: Enables iterative, language-driven refinement, allowing selective, state-aware manipulation of grading parameters based on user feedback, with structural locking of irrelevant parameters for stable, predictable convergence.
Figure 1: Architecture of LumiVideo, illustrating the agentic loop of Perception, Reasoning, Execution, and Reflection for iterative grading refinement.

Structured Reasoning and Iterative Control

The core innovation in LumiVideo's Reasoning module lies in hierarchical parametric exploration. The Tree of Thoughts search, with RAG anchoring, allows systematic expansion and pruning over candidate grading strategies. Each candidate node integrates RAG-based cinematic heuristics, is instantiated with explicit ASC-CDL parameters, and is quantitatively evaluated by a VLM-based critic for both cinematic intent and preservation of protected tones. This explicit search and evaluation loop circumvents the limitations of single-pass CoT prompting and enables robust optimization under complex scene semantics.

The Reflection stage operationalizes fine-grained, human-in-the-loop iterative control. User directives in natural language are parsed to both the relevant grading parameter and the intended magnitude/intensity of the adjustment. Uninvolved parameters are structurally locked, ensuring only targeted changes occur with each iteration, and facilitating fast convergence to a desired look with minimal trial-and-error.

Figure 2: Reflection-driven iteration: a user directive leads to targeted refinement of ASC-CDL parameters while maintaining consistency across all other aspects of the grade.

Benchmarking: LumiGrade Dataset

Progress in automated color grading has been historically impeded by the lack of authentic, log-encoded benchmarks. LumiGrade fills this gap by providing more than 100 clips from multiple real-world camera log formats, along with detailed metadata and expert-graded references. Each reference includes both ASC-CDL parameter exports and rendered Rec.709 outputs, enabling rigorous evaluation in both pixel and parameter spaces.

Experimental Results

Qualitative Outcomes

LumiVideo demonstrates domain-robust and cinematographically consistent results across varied scenes and camera formats. Pixel-based generative models (e.g., GPT-5.3, Gemini 3.1) often misinterpret log-encoded input, introducing severe artifacts and semantic inconsistencies. In contrast, LumiVideo's output:

Preserves highlight detail and natural skin tones, adapted contextually to scene content.
Produces artifact-free grading with strong temporal and spatial consistency.
Responds predictably to iterative user feedback, mirroring professional grading workflows.
Figure 3: Qualitative comparison showing superior consistency and cinematic fidelity of LumiVideo relative to state-of-the-art generative and LUT-based baselines.

Quantitative Analysis

LumiVideo achieves the highest scores across all objective and subjective grading metrics, surpassing generalist and LUT-based methods. Notably, it matches or exceeds the performance of human experts on technical, aesthetic, and grading-specific (LLM-Judge) criteria. The method's industry-compliant outputs are rated highly usable by professional colorists, confirming the practical value of its parameter-centric design.

Ablative Insights

Ablation studies validate the necessity of each system component:

Removing Tree of Thoughts markedly degrades both creative and technical quality.
Excluding RAG heuristics attenuates cinematic alignment and aesthetic appeal.
Omitting protected tone constraints compromises critical hue integrity, especially for skin and important environmental tones.
Single-shot (non-reflective) operation is consistently weaker versus iterative refinement.

Implications and Future Directions

LumiVideo establishes a new standard for automated video color grading, synthesizing agentic reasoning with semantically grounded, interpretable parameters. Its framing of grading as an interactive, iterative decision process paves the way for AI-powered tools that integrate seamlessly within existing production pipelines, allowing both automation and fine-grained creative control.

Theoretically, this work illustrates the computational tractability of operating over continuous, domain-standard parameter spaces with agent-based models, rather than ill-conditioned pixel spaces. Practically, it sets a precedent for the migration of high-level creative intent from human experts to intelligent agents, directly interfacing with professional editing platforms.

A primary current limitation is the absence of spatially-varying parameterization; the method applies global LUT adjustments. Extending to local, subject-aware grading through joint action over region masks or secondary qualifiers represents a logical trajectory for future exploration and could further close the gap with expert colorists. In addition, further research into integrating multimodal intent—combining narrative, audio, or shot metadata—could further enhance creative alignment.

Conclusion

LumiVideo advances automated video color grading by aligning intelligent agentic reasoning with industry standards. It achieves both high quantitative and qualitative performance, operationalizes iterative user-steerable refinement, and is grounded in a robust benchmark representative of real-world professional challenges. The method's confluence of transparency, control, and technical rigor situates it as an instructive model for broader visual AI applications that demand interpretability and creative autonomy.

Markdown Report Issue