LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics
Abstract: Comprehensive understanding of time series remains a significant challenge for LLMs. Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge this gap, we formalize Time Series Reasoning (TSR) via a four-level taxonomy of increasing cognitive complexity. We introduce HiTSR, a hierarchical time series reasoning dataset comprising 83k samples with diverse task combinations and verified Chain-of-Thought (CoT) trajectories. Leveraging HiTSR, we propose LLaTiSA, a strong TSRM that integrates visualized patterns with precision-calibrated numerical tables to enhance the temporal perception of Vision-LLMs (VLMs). Through a multi-stage curriculum fine-tuning strategy, LLaTiSA achieves superior performance and exhibits robust out-of-distribution generalization across diverse TSR tasks and real-world scenarios. Our code is available at https://github.com/RainingNovember/LLaTiSA.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What this paper is about (big picture)
This paper is about teaching AI to understand and reason about “time series” — data that changes over time, like heartbeats on an ECG, stock prices each day, or temperature by the hour. The authors say current AI models don’t handle time series well because tasks and tests are messy and unclear. They fix this by:
- Defining clear “levels” of difficulty for time series reasoning.
- Building a large, carefully checked dataset to train and test these skills.
- Creating a new AI model, called LLaTiSA, that looks at both graphs and numbers to make better, more reliable decisions about time-based data.
What questions the paper tries to answer
The paper asks simple but important questions:
- How should we break down time series reasoning into clear, learnable steps?
- Can we build a high-quality dataset that trains and tests these steps without confusion?
- Can an AI that “looks” at both the picture of a time series and a neat table of numbers reason better than models that only read numbers or only see pictures?
- Will this new approach still work well on new, different datasets (not just what it trained on), and in real-world tasks like reading ECGs?
How the researchers approached the problem (methods, in everyday terms)
Think of learning time series like leveling up in a video game. The authors define four levels:
- L1: Numerical read-out — find exact values (like “What’s the highest point and when did it happen?”).
- L2: Pattern perception — spot shapes and trends (like “Is this line rising, spiky, or stable?”).
- L3: Semantic reasoning — mix the data with real-world meaning (like “Given these heart signal patterns, which condition is most likely?”).
- L4: Predictive inference — make future predictions (this paper focuses on L1–L3).
To support these levels, they built a dataset called HiTSR with about 83,000 examples:
- L1 and L2 use lots of synthetic (computer-generated) time series so they can control difficulty and variety.
- L3 uses real-world time series (from areas like health and industry) with added context.
- Many questions are multiple-choice, and all answers and “reasoning steps” (the Chain-of-Thought, or CoT) are checked by both AI and humans to avoid ambiguity.
They then designed a new model: LLaTiSA.
- Instead of just reading numbers or just seeing a graph, LLaTiSA looks at two images at once:
- A time series plot (the line graph) to understand the overall shape.
- A clean index–value table (like a screenshot of a spreadsheet) to check exact numbers.
- This “dual-view” input helps the model combine big-picture intuition (from the plot) with number-precise evidence (from the table).
- They trained the model in stages to match the levels: first L1, then L2, then L3. This is like a curriculum that builds skills step by step.
What they found and why it matters
The authors tested LLaTiSA on datasets it wasn’t trained on (to see if it generalizes). Here are the key takeaways:
- Stronger at basics: LLaTiSA was much better at L1 tasks (finding exact values at the right times) than models that used only text or only images. The table view helped reduce mistakes where the AI “guesses” numbers from the plot.
- Better pattern reading: For L2 tasks (spotting spikes, trends, or shapes), LLaTiSA beat other models, especially those that didn’t use both plot and table together.
- More reliable reasoning: Including step-by-step “thinking” examples (Chain-of-Thought) during training helped the model explain itself and improved performance on new, unfamiliar tests.
- Curriculum works: Training in stages (L1→L2→L3) led to better results than mixing all tasks at once, especially on harder, real-world reasoning.
- Real-world gains: When adapted to read ECGs, LLaTiSA analyzed per-lead evidence more consistently and improved diagnostic signals compared to a strong baseline with similar size, despite using far less training data. This shows it’s data-efficient and practical.
Why this matters: Time series power important decisions (health, finance, industry). A model that is both visually intuitive and numerically precise is more trustworthy and useful. The study shows a clear path to building such models: define levels, create clean training data, and teach skills step by step.
What this could change going forward
- Better tools for experts: Doctors, engineers, and analysts could use AI that not only spots patterns but also backs them up with exact numbers and clear reasoning.
- Clearer progress for research: The four-level framework and the HiTSR dataset give the community a shared way to train and compare time series models fairly.
- Safer decisions: Models that verify numbers (not just “eyeball” graphs) reduce risky mistakes.
- Next steps: The authors plan to tackle L4 (prediction) more directly and explore reinforcement learning to further refine how the model reasons across different difficulty levels.
In short, this paper shows how to teach AI to “see” time-based data like a careful student: first get the numbers right, then learn the patterns, then understand the meaning — and always check your work.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
The paper advances a taxonomy, dataset (HiTSR), and a VLM-based model (LLaTiSA) for time series reasoning, but it leaves several concrete issues unresolved. Future work could address the following gaps:
- L4 predictive inference remains unaddressed: no dataset, tasks, or modeling strategies are provided for the forecasting level, nor is the interplay between reasoning and prediction evaluated.
- Reliance on synthetic data for L1–L2: foundational skills are mostly trained on synthetic series, leaving uncertainty about robustness to diverse real-world noise, irregular sampling, missingness, and domain-specific artifacts.
- Limited coverage of multivariate/high-dimensional series at L1–L2: the dataset and tasks do not clearly stress multivariate structure, cross-variable dependencies, or asynchronous sampling at foundational levels.
- Visual rendering sensitivity: robustness to plot styles (axes scales, ticks, gridlines, colors), charting libraries, compression artifacts, occlusion, and clutter is not systematically assessed.
- Numeric extraction via image tables: using an image-based index-value table may introduce OCR-like errors; a direct comparison to structured numeric inputs (e.g., CSV tokens or specialized numeric encoders) under identical supervision is missing.
- Faithfulness and use of CoT rationales: while CoT improves OOD accuracy, the paper does not measure rationale faithfulness (e.g., causal influence of steps), consistency checks, or whether the model’s reasoning is necessary/sufficient for answers.
- Uncertainty quantification and calibration: no confidence measures, calibrated probabilities, or error bounds are reported, especially critical for L3 semantic judgments and eventual L4 forecasting.
- Evaluation breadth and statistical rigor: OOD tests use small samples (e.g., 100–500 items); there is no analysis of variance, confidence intervals, or significance testing to support claims of generalization.
- Difficulty calibration of the taxonomy: the proposed L1–L4 levels are not psychometrically validated (e.g., item response theory, human baselines) to confirm progressive cognitive complexity and consistent task difficulty.
- Generalization beyond the tested domains: transfer is shown for ECG, but not for other high-stakes areas (finance, industrial monitoring, climate) with domain-specific semantics and failure modes.
- Long-context and streaming reasoning: the approach is not evaluated on very long time series, streaming inputs, or online decision-making where memory and latency constraints are central.
- Handling irregularities: tasks do not explicitly benchmark robustness to missing values, non-stationarity shifts, heterogeneous sampling rates, calendar effects, or covariate shift common in real data.
- Robustness to adversarial or spurious cues: there is no stress testing against perturbed axes, misleading annotations, spurious correlations, or adversarial distractors in plots/text.
- Model scalability and efficiency: memory, compute, and latency trade-offs of dual-image inputs versus textual encodings or specialized TS encoders are not quantified.
- Comparative ablations on input strategies: while several encodings are compared, fine-grained ablations (e.g., single-plot plus numeric text; learned visual-number tokenizers; specialized numerical modules) are limited.
- Alternative curricula: curriculum order and pacing are not explored (e.g., interleaving, self-paced learning, task weighting), leaving open how different schedules impact generalization.
- Reward design for RL fine-tuning: the paper notes RL challenges but does not propose concrete, testable reward formulations to supervise both low-level precision and high-level semantics across L1–L4.
- Data annotation provenance and reproducibility: dependence on closed-source LLMs (e.g., “GPT-5”) for annotation/verification raises reproducibility and bias concerns; transparent alternatives and inter-annotator reliability are not reported.
- Bias and artifact analysis in HiTSR: there is no audit for dataset artifacts (template biases, lexical cues, distractor selection biases) that models might exploit, nor controls to mitigate them.
- Multiple-choice framing limitations: L2–L3 tasks are MCQ-based, which may inflate accuracy via recognition or test-taking strategies; open-ended reasoning and generative evaluation are not assessed.
- Error granularity for numerical tasks: accuracy and success rate are reported, but fine-grained numeric error (e.g., MAE of read-outs) and tolerance thresholds are not analyzed.
- Faithful multi-series alignment: L3 “series comparison” assumes correct cross-series alignment and metadata quality; the impact of misalignment or inconsistent context is not studied.
- Cross-lingual and multilingual robustness: dataset and evaluations are monolingual; performance on multilingual instructions and labels is unknown.
- Safety and reliability in real-world use: no formal evaluation of failure modes, hallucinations, or guardrails is provided for high-stakes domains (e.g., medicine), nor are human-in-the-loop protocols defined.
- Interpretability beyond CoT: other interpretable artifacts (saliency on plots/tables, temporal evidence highlighting) are not explored to strengthen trust and error analysis.
Practical Applications
Immediate Applications
Below are actionable use cases that can be deployed now by leveraging the paper’s taxonomy (L1–L4), the HiTSR dataset (83k verified samples with Chains-of-Thought), and the LLaTiSA model (dual-view plot+table input with curriculum fine-tuning).
- Evidence‑grounded analytics copilot for dashboards
- Sectors: software (BI/analytics), finance, retail, operations
- What it does: Lets users ask questions of charts (e.g., “When is the max, by how much did metric X change between T1–T2, which segment spiked?”), returning precise timestamps/values and short pattern narratives.
- Enabled by: LLaTiSA’s dual‑view numerical grounding (L1) + pattern perception (L2).
- Tools/products/workflows: Add a “TSR Q&A” widget to BI tools (Power BI/Tableau/Looker); an API accepting a plot image + index‑value table image; auto‑generated report snippets.
- Assumptions/dependencies: Consistent plot rendering and table image generation; authentication/PII handling; minor domain fine‑tuning for in‑house metrics.
- Alert triage and incident postmortems for time‑series operations
- Sectors: AIOps/SRE, manufacturing, IoT/telemetry, logistics
- What it does: Explains alerts with exact evidence (max/min localization, spike timing, step changes), prioritizes anomalies, and summarizes local/global patterns for on‑call triage.
- Enabled by: LLaTiSA’s superior L1/L2 generalization across OOD benchmarks.
- Tools/products/workflows: “Explain alert” button in observability tools; ops runbook generators using HiTSR‑tuned models.
- Assumptions/dependencies: Access to metric images or auto‑rendered plots+tables; integration with ticketing; noise/outlier policy configuration.
- Evidence‑based ECG interpretation assistant
- Sectors: healthcare
- What it does: Produces per‑lead assessments and diagnostic summaries grounded in waveform evidence; improves lead coverage/accuracy in ID/OOD settings.
- Enabled by: LLaTiSA fine‑tuned on ECG‑Grounding; demonstrated lead‑wise gains.
- Tools/products/workflows: PACS/EHR plugin; cardiology triage dashboard; QA checker for AI ECG outputs.
- Assumptions/dependencies: Clinical validation and governance; device/domain fine‑tuning; HIPAA/GDPR compliance.
- Financial reporting and compliance narratives grounded in time series
- Sectors: finance, fintech, accounting
- What it does: Generates audited, numerically‑grounded text for KPIs (P&L, risk, liquidity) with precise time/value references; reduces hallucination in report generators.
- Enabled by: Dual‑view numeric precision (L1) and pattern differentiation (L2).
- Tools/products/workflows: Report co‑authoring; SOX/ESG dashboards with “explain this trend” functions.
- Assumptions/dependencies: Consistent index alignment (fiscal calendars); audit trails; human review loops.
- Energy and utility monitoring assistant
- Sectors: energy, utilities, smart grid, HVAC
- What it does: Identifies load spikes, curtailment windows, and consumption anomalies with timestamped evidence; supports demand‑response briefings.
- Enabled by: L2 local/global pattern perception with numerical grounding.
- Tools/products/workflows: Operator consoles with explainers; customer‑facing energy usage summaries.
- Assumptions/dependencies: Stable telemetry feeds; seasonal/context metadata for better L3 reasoning if desired.
- Marketing and customer analytics trend explainer
- Sectors: marketing analytics, e‑commerce, media
- What it does: Summarizes campaign lift, seasonality, or cohort divergences with precise figures and timing; supports weekly business reviews.
- Enabled by: L2 pattern description + L1 precise measurement.
- Tools/products/workflows: Slide and memo generation; alert explainers.
- Assumptions/dependencies: Clear segment indexing; ability to render numerical tables.
- Plot‑to‑numbers extractor for legacy charts
- Sectors: research, competitive intelligence, journalism
- What it does: Extracts index/value pairs from published plots (when raw data are unavailable) for precise citations and comparisons.
- Enabled by: The numeric grid image + vision token processing approach.
- Tools/products/workflows: “Plot2Table” microservice deployed as a browser extension or data ingestion tool.
- Assumptions/dependencies: Plot quality and axis readability; licensing/usage rights for extracted data.
- Internal benchmarking and training harness for TSR
- Sectors: academia, industry R&D, model vendors
- What it does: Uses the L1–L3 taxonomy and HiTSR to benchmark, diagnose, and improve models’ numerical read‑out, pattern perception, and semantic reasoning.
- Enabled by: HiTSR’s verified CoT and unambiguous tasks; curriculum design.
- Tools/products/workflows: Continuous evaluation suite; “TSR Eval Badge” for procurement/readiness checks.
- Assumptions/dependencies: Access to HiTSR (license/compliance); standardized task templates.
- Courseware and tutoring for time‑series literacy
- Sectors: education, corporate training
- What it does: Generates question sets and worked solutions aligned to L1–L3; supports students in reading plots, detecting patterns, and articulating evidence.
- Enabled by: HiTSR’s difficulty‑stratified tasks with verified CoT.
- Tools/products/workflows: LMS modules; auto‑graded exercises; formative feedback bots.
- Assumptions/dependencies: Curriculum alignment; institution policy for AI assessment.
- Civic and policy dashboards with grounded explanations
- Sectors: government, public health, economics
- What it does: Adds numerically‑grounded, plain‑language explanations to public dashboards (e.g., unemployment, influenza trends) to reduce misinterpretation.
- Enabled by: L1/L2 evidence binding; L3 semantic alignment via domain metadata.
- Tools/products/workflows: Open‑data portal plugins; explanation audit logs.
- Assumptions/dependencies: Accessibility standards; editorial review; robust metadata for context.
Long‑Term Applications
These require additional research, scaling, domain adaptation, or methodological advances (especially toward L4 predictive inference and RL fine‑tuning).
- Predictive reasoning and decision support (L4)
- Sectors: energy, supply chain, finance, healthcare
- What it could do: Couple forecasts with evidence‑grounded reasoning and action recommendations (“shed load at T+2 due to spike risk”).
- Dependencies: Extension of the taxonomy to L4; RL fine‑tuning with well‑shaped rewards; evaluation protocols for forecast+reason synergy.
- Regulatory‑grade, auditable AI narratives
- Sectors: finance, healthcare, critical infrastructure
- What it could do: Produce explanations with verifiable links to indices/values for audits and compliance filings.
- Dependencies: Standardized attestations; traceability tooling; alignment with legal frameworks.
- Autonomous multimodal agents for real‑time systems
- Sectors: industrial automation, IoT, mobility
- What it could do: Fuse time series with images/text to monitor, diagnose, and act in closed loop.
- Dependencies: Low‑latency inference; streaming interfaces; safety guardrails; domain simulators.
- Root‑cause analysis with causal/time‑aware reasoning
- Sectors: AIOps, manufacturing, telecom
- What it could do: Move beyond description to causal hypotheses and structured counterfactuals grounded in multivariate series.
- Dependencies: Causal modeling modules; richer L3/L4 datasets; intervention validation.
- Scientific assistants for experimental time series
- Sectors: materials, chemistry, neuroscience, climate
- What it could do: Summarize experiments, flag anomalies, and connect patterns to literature with line‑by‑line evidence.
- Dependencies: Domain knowledge integration (RAG); high‑fidelity plots and metadata.
- Robotics and control via sensor time‑series reasoning
- Sectors: robotics, autonomous systems
- What it could do: Interpret multi‑sensor streams to justify control decisions with traceable evidence.
- Dependencies: Tight integration with control stacks; real‑time constraints; safety certification.
- Grid and market operations co‑pilot
- Sectors: energy markets, utilities
- What it could do: Jointly reason about demand, prices, outages; explain forecasts and suggest interventions.
- Dependencies: Market/regulatory data; L4 forecasting aligned to operations; policy‑compliant logs.
- Financial risk and trading assistants with calibrated reasoning
- Sectors: asset management, treasury, banking
- What it could do: Evidence‑grounded alerts on VaR breaches, liquidity squeezes, regime shifts; scenario reasoning over time series.
- Dependencies: Calibration and backtesting harness; guardrails for decision automation.
- Multimodal clinical reasoning beyond ECG
- Sectors: healthcare
- What it could do: Extend to EEG, PPG, ICU vitals; generate evidence‑linked differential diagnoses.
- Dependencies: Curated L3 datasets; clinical trials; bias/fairness audits.
- Standardized procurement benchmarks for public AI systems
- Sectors: government, NGOs
- What it could do: Use L1–L3 (and future L4) metrics to certify models’ numerical grounding and reasoning for public deployments.
- Dependencies: Policy adoption; open test suites; red‑team guidance.
- Federated and privacy‑preserving TSR
- Sectors: healthcare, finance, telco
- What it could do: Train/evaluate TSRMs across siloed time series without raw data sharing, retaining evidence‑grounded behavior.
- Dependencies: Federated/DP tooling; secure rendering of dual‑view inputs.
- Adaptive tutoring with mastery‑based progression
- Sectors: education
- What it could do: Personalized progression from L1 to L3/L4, with CoT‑based feedback and skill diagnostics.
- Dependencies: Longitudinal learner models; item‑generation quality control.
Cross‑cutting assumptions and dependencies
- Data and model availability: Access to HiTSR and LLaTiSA code; a capable VLM backbone (e.g., Qwen3‑VL‑8B or equivalent).
- Input preparation: Reliable rendering of both plot and index‑value table images; consistent time indexing.
- Domain adaptation: L3 and specialized tasks benefit from fine‑tuning on domain‑specific data and metadata.
- Governance: Privacy/security (especially in healthcare/finance), audit trails for evidence grounding, and human‑in‑the‑loop review for high‑stakes use.
- Compute and latency: GPU/accelerator access and optimization for real‑time or batch settings.
- Generalization risks: OOD robustness is improved but not guaranteed; monitor for formatting drift, noise, and novel regimes.
Glossary
- Chain-of-Thought (CoT): An explicit intermediate reasoning trace used to make model decisions interpretable and verifiable. "verified Chain-of-Thought (CoT) trajectories."
- Curriculum fine-tuning: A staged training regimen that orders tasks by difficulty to progressively build capabilities. "Through a multi-stage curriculum fine-tuning strategy, LLaTiSA achieves superior performance and exhibits robust out-of-distribution generalization across diverse TSR tasks and real-world scenarios."
- Difficulty-stratified taxonomy: A framework that organizes tasks by increasing cognitive complexity to diagnose and develop reasoning skills. "To address these limitations, we introduce a difficulty-stratified taxonomy that organizes TSR into progressively increasing levels of complexity."
- Dual-view input framework: A modeling setup that ingests both a plot and a structured numeric rendering of the same time series to combine perception with precise grounding. "we propose LLaTiSA (\Cref{fig:2}.b), a dual-view input framework that pairs standard time series visualizations with a secondary image rendering the data as a structured index-value table."
- ECG: Electrocardiogram; a clinical time series modality used for cardiac diagnosis and reasoning tasks. "we further perform Supervised Fine-Tuning (SFT) on the ECG-Grounding 30k dataset."
- Evidence-Based Reasoning: A metric and approach emphasizing claims supported by explicit signal features or measurements. "Evi. Reas. represents Evidence-Based Reasoning."
- In-distribution (ID): Data drawn from the same distribution as training, used to assess within-domain performance. "under both in-distribution (ID) and OOD settings"
- Index-value table: A structured image of timestamps (indices) and corresponding values to enable precise numeric reference. "a secondary image rendering the data as a structured index-value table."
- Lead assessment coverage: In ECG analysis, the proportion of leads for which the model provides assessments. "Specifically, LLaTiSA achieves remarkable gains in lead assessment coverage and accuracy, outperforming GEM (LLaVA) by 18.14% and 14.22% in the ID evaluation, respectively."
- Lead-wise evaluation: Assessing ECG performance per individual lead to mirror clinical diagnostic practice. "LLaTiSA exhibits a distinct advantage in lead-wise evaluation, which directly reflects its adherence to the structured, 12-lead diagnostic procedure employed by professional clinicians."
- Numerical grounding: Tying reasoning steps to concrete numeric evidence from the signal to avoid ambiguity. "To empower VLMs with precise numerical grounding, we propose LLaTiSA (\Cref{fig:2}.b), a dual-view input framework..."
- Numerical hallucinations: Model-produced numeric statements that are not supported by the data. "thereby significantly mitigating numerical hallucinations and improving performance on numerical-sensitive tasks."
- Numerical Read-out: The basic ability to retrieve exact values at specified times from a series. "L1: Numerical Read-out. Establish time-aware indexing and point-level numerical retrieval."
- Out-of-distribution (OOD): Data that differs from the training distribution, used to test generalization. "we report results exclusively on out-of-distribution (OOD) datasets across levels L1-L3 (see \Cref{tab:zeroshot})."
- Pattern Differentiation: Distinguishing among local or global temporal patterns across series. "which focus on local and global pattern differentiation, respectively."
- Pattern Perception: Recognizing and characterizing temporal patterns beyond point estimates. "L2: Pattern Perception. Identify and differentiate multi-scale temporal patterns using quantitative evidence."
- Predictive Inference: Generating forecasts of future time-series values with high fidelity. "L4: Predictive Inference. Generate high-fidelity time-series predictions."
- Q-former: A query-centric vision-language module architecture used to encode inputs for downstream reasoning. "and a Q-former \cite{blip} style time series encoder to perform multivariate TSR tasks, respectively."
- Reinforcement Learning Fine-Tuning (RFT): Optimizing models with RL signals to refine reasoning policies beyond supervised objectives. "leaving the exploration of Reinforcement Learning Fine-Tuning (RFT) on HiTSR as a future direction."
- Semantic Reasoning: Integrating signal evidence with contextual knowledge to reach domain-specific conclusions. "L3: Semantic Reasoning. Integrate time series observations with contextual knowledge to perform domain-specific reasoning."
- Series-level perception: Understanding higher-level shapes and structures across time rather than isolated points. "transitioning from point-level numerical grounding to series-level perception, facilitating high-level semantic interpretation, and ultimately enabling context-aware generation."
- Success Rate (SR): A validity metric for whether the model’s answer is well-formed and correctly references indices/values. "and 'SR' denotes whether the model provides valid answers or correctly maps target values with correct indices."
- Supervised Fine-Tuning (SFT): Training with labeled examples to adapt a pre-trained model to specific tasks. "we perform sequential Supervised Fine-Tuning (SFT) on HiTSR-L1 and HiTSR-L2 to consolidate the model's numerical read-out precision and pattern perception capabilities."
- Time-aware indexing: Mapping observations to specific timestamps to enable accurate retrieval and alignment. "Establish time-aware indexing and point-level numerical retrieval."
- Time series encoder: A neural module specialized for representing raw time-series signals for downstream tasks. "incorporate an MLP-based and a Q-former style time series encoder to perform multivariate TSR tasks, respectively."
- Time Series Reasoning (TSR): End-to-end understanding of time series grounded in numeric evidence, patterns, and context. "We formalize Time Series Reasoning (TSR) via a four-level taxonomy of increasing cognitive complexity."
- Time Series Reasoning Model (TSRM): A model designed specifically to perform TSR tasks across levels of complexity. "the development of unified Time Series Reasoning Models (TSRMs)."
- Time-Series Multimodal LLM (TS-MLLM): An LLM that integrates time-series encoders with other modalities for joint reasoning. "the integration of dedicated time-series encoders to construct Time-Series Multimodal LLMs (TS-MLLMs)."
- Visual tokens: Compact visual representations used to encode text or structured data in vision models. "which utilizes visual tokens to represent textual information efficiently"
- Vision-LLM (VLM): A model jointly processing images and text for multimodal reasoning. "Vision LLMs (VLMs) can excel in basic TSR tasks by relying exclusively on time series visualizations"
Collections
Sign up for free to add this paper to one or more collections.