REMSA: Remote Sensing Model Selection Agent

Updated 28 November 2025

REMSA is a language model-based agent that automates the selection of remote sensing foundation models using natural language queries.
It integrates a rigorously structured Remote-Sensing Foundation Model Database with semantic retrieval methods like Sentence-BERT and FAISS.
The infrastructure enforces data integrity through schema validation, confidence thresholds, and rule-based constraints to ensure transparent model selection.

REMSA refers to a LLM-based agent and associated infrastructure for foundation model selection in remote sensing, introduced as part of a unified framework that combines the Remote-Sensing Foundation Model Database (RS-FMD) with advanced retrieval and reasoning components. REMSA addresses the acute need for transparent, automated, and constraint-aware selection of Remote Sensing Foundation Models (RSFMs) in a domain characterized by heterogeneous modality support, varied metadata formats, and task-specific performance requirements (Chen et al., 21 Nov 2025).

1. Scope and Definition

REMSA (Remote-sensing foundation-model Selection Agent) is the first LLM-based agent system constructed for automated RSFM selection via natural language queries. It is built upon the RS-FMD, a curated schema-verified database documenting over 150 publicly released foundation models within the remote sensing (RS) domain. RSFMD encompasses models spanning multiple data modalities (optical, multispectral, hyperspectral, SAR, LiDAR, and image-text), resolutions, and pretraining paradigms. REMSA interprets unstructured user instructions, resolves missing or ambiguous constraints, retrieves candidate models, ranks them with transparent justification, and outputs rationales for each selection (Chen et al., 21 Nov 2025).

2. RS-FMD: Structure, Schema, and Coverage

RS-FMD (“Remote-Sensing Foundation Model Database”) provides the structured backbone for REMSA. Each entry is represented as a JSON object conforming to a rigorously defined schema (type-checked via pydantic and detailed in Appendix A of (Chen et al., 21 Nov 2025)). The schema consists of high-granularity top-level fields and two nested structures for pretraining phases and evaluation benchmarks.

Key Top-Level Fields:

model_id, model_name, version, release_date, last_updated
short_description, paper_link, citations, repository, weights
backbone, num_layers, num_parameters
pretext_training_type, masking_strategy, pretraining
domain_knowledge, backbone_modifications, supported_sensors
modality_integration_type, modalities
spectral_alignment, temporal_alignment
spatial_resolution, temporal_resolution, bands

Nested Structures (abbreviated):

PretrainingPhase (list): dataset, regions_coverage, time_range, num_images, token_size, image_resolution, epochs, batch_size, learning_rate, augmentations, processing, sampling, processing_level, cloud_cover, missing_data, masking_ratio.
Benchmark (list): task, application, dataset, metrics, metrics_value, sensor, regions, original_samples, num_samples, sampling_percentage, num_classes, classes, image_resolution, spatial_resolution, bands_used, augmentations, optimizer, batch_size, learning_rate, epochs, loss_function, split_ratio.

LaTeX-formatted schema tables (see Appendix A in (Chen et al., 21 Nov 2025)) formalize field types and constraints for data integrity.

Coverage:

≃150 RSFMs
Modalities: Optical (RGB), Multispectral, Hyperspectral, SAR, LiDAR, Vision–Language
Supported tasks (as documented in nested Benchmarks): Semantic segmentation, classification, change detection, visual question answering

3. Model Categorization and Metrics

RSFMs in RS-FMD are stratified by both learning paradigm and application task:

Learning Paradigms:

Supervised (fine-tuned on labeled RS data)
Self-supervised/masked modeling (MAE-style pretraining)
Contrastive Vision–Language (e.g., CLIP-style on image–text pairs)
Multimodal fusion architectures integrating heterogeneous modalities

Relevant schema fields: pretext_training_type, masking_strategy, modality_integration_type.

Tasks and Evaluation:

Each Benchmark object in the schema captures a unique combination of task (e.g., change detection), dataset, sensor, metric(s), and result value(s).
Key metrics include Intersection-over-Union (IoU):

$\mathrm{IoU} = \frac{|\mathrm{Prediction}\cap \mathrm{GroundTruth}|}{|\mathrm{Prediction}\cup \mathrm{GroundTruth}|}$

and F1-score:

$\mathrm{F1} = 2\;\frac{\mathrm{Precision}\times\mathrm{Recall}}{\mathrm{Precision}+\mathrm{Recall}}, \quad \mathrm{Precision}=\frac{TP}{TP+FP},\ \mathrm{Recall}=\frac{TP}{TP+FN}$

Parallel arrays (metrics, metrics_value) associate each metric to its measured value for each task.

4. Query Processing and Retrieval Pipeline

REMSA operationalizes model selection through a multi-stage information processing pipeline:

Data Structures and Indexing:

Records are stored in JSONL, validated via pydantic, and versioned with DVC.
Each model record is embedded using Sentence-BERT, incorporating specialized prefix tokens (e.g., [MODALITY], [APPLICATION]) to enhance semantic retrieval.
Embeddings are indexed in FAISS for efficient approximate nearest-neighbor search based on cosine similarity.
Explicit rule-based filters enforce hard constraints (e.g., required modality, minimum spatial resolution, performance thresholds).

Semantic Parsing and Retrieval:

Free-text user queries are parsed and normalized into a structured constraint dict (JSON) with required fields like application, modality, sensor, spatial_resolution, priority_metrics, min_performance, etc.

Example structured query:

{
  "application": "change detection",
  "modality": "SAR",
  "sensor": ["Sentinel-1"],
  "spatial_resolution": 10,
  "priority_metrics": ["F1"],
  "min_performance": {
    "metric": ["F1"],
    "value": [0.80]
  }
}

The retrieval phase applies hard filters, then ranks remaining candidates using a hybrid similarity and LLM-based scoring function:

$S(m, q) = \mathbf{1}\{m\ \text{meets hard constraints}\} \times \left[ \alpha\,\mathrm{sim}(q,m) + (1-\alpha)\,\mathrm{LLMScore}(m\mid q) \right]$

where $\mathrm{sim}(q,m)$ is embedding similarity from FAISS, $\mathrm{LLMScore}(m\mid q)$ is ordinal ranking from in-context LLM prompts, and $\alpha\in [0,1]$ is a task-tunable balance factor.

Explanation, Clarification, and Output:

If rule-based filtering results in candidate sets that are too large or too small, REMSA’s Clarification Generator elicits up to three disambiguation questions from the user.
Top-k returned results are delivered with an Explanation Generator that produces a JSON array listing each model, rationale, and documentation links.

5. Agent Integration, Ranking, and Confidence

Pipeline Overview:

Interpreter: Converts natural-language query to structured schema.
Retriever: FAISS+Sentence-BERT produces similarity-based shortlist.
Ranker: LLM ranks candidates using in-context learning and structured metadata.
Explanation system: Provides reasoned output, with model metadata and links.

Confidence Mechanism:

During database population, each field receives an extraction-confidence score:

$\mathrm{Conf} = w_{\log p}\,\mathrm{NormalizedLogProb} + w_{\mathrm{cons}}\,\mathrm{SelfConsistency}$

with $w_{\log p}=0.7$ , $w_{\mathrm{cons}}=0.3$ , and mandatory threshold $\theta=0.75$ for automatic acceptance.

Only fields with $\mathrm{Conf}\geq 0.75$ are admitted automatically; others are flagged for human review at the field level.

Consistency constraints:

Arrays like benchmarks.metrics and benchmarks.metrics_value are enforced to be of equal length.
modalities must be a subset of enumerated valid types: {Optical, Multispectral, Hyperspectral, SAR, LiDAR, Text}.

6. Extension, Maintenance, and Guarantees

Update and Addition Pipeline:

Automated metadata extraction: LLM-based (OneKE-inspired) methods parse papers, model cards, and repositories to generate candidate JSON schemas.
Community submission interface: Authors may upload new documentation, triggering auto-extraction and verification UI for schema validation.
Versioning: All records managed with DVC for robust audit tracking.

Quality Guarantees:

Schema validation is mandatory for any new record.
Confidence thresholding and type-checking are implemented systematically.
Consistency and domain-specific constraints (e.g., modalities, metrics alignment) are verified at ingestion time.

7. Significance and Benchmarking

By linking a schema-rich, auto-updating RSFM database with LLM-driven retrieval, REMSA enables fully automated, transparent model selection for use cases such as environmental monitoring, change detection, disaster assessment, and semantic land-cover mapping. REMSA is benchmarked against 75 expert-verified RS query scenarios, generating 900 query-model configurations. Empirically, REMSA outperformed naive selection agents, dense retrieval methods, and unstructured RAG-based LLMs (Chen et al., 21 Nov 2025). The infrastructure operates exclusively on public metadata, ensuring compatibility with open science mandates and the absence of sensitive data handling. This approach provides a scalable, extensible template for foundation model selection agents across data-driven scientific domains.

Markdown Upgrade to Chat

References (1)

REMSA: An LLM Agent for Foundation Model Selection in Remote Sensing (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to REMSA.

REMSA: Remote Sensing Model Selection Agent

1. Scope and Definition

2. RS-FMD: Structure, Schema, and Coverage

3. Model Categorization and Metrics

4. Query Processing and Retrieval Pipeline

5. Agent Integration, Ranking, and Confidence

6. Extension, Maintenance, and Guarantees

7. Significance and Benchmarking

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

REMSA: Remote Sensing Model Selection Agent

1. Scope and Definition

2. RS-FMD: Structure, Schema, and Coverage

3. Model Categorization and Metrics

4. Query Processing and Retrieval Pipeline

5. Agent Integration, Ranking, and Confidence

6. Extension, Maintenance, and Guarantees

7. Significance and Benchmarking

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research