Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 71 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 18 tok/s Pro

GPT-5 High 15 tok/s Pro

GPT-4o 101 tok/s Pro

Kimi K2 196 tok/s Pro

GPT OSS 120B 467 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

Agentic AI framework for End-to-End Medical Data Inference (2507.18115v1)

Published 24 Jul 2025 in cs.AI, cs.CL, cs.CY, cs.ET, and cs.LG

Abstract: Building and deploying machine learning solutions in healthcare remains expensive and labor-intensive due to fragmented preprocessing workflows, model compatibility issues, and stringent data privacy constraints. In this work, we introduce an Agentic AI framework that automates the entire clinical data pipeline, from ingestion to inference, through a system of modular, task-specific agents. These agents handle both structured and unstructured data, enabling automatic feature selection, model selection, and preprocessing recommendation without manual intervention. We evaluate the system on publicly available datasets from geriatrics, palliative care, and colonoscopy imaging. For example, in the case of structured data (anxiety data) and unstructured data (colonoscopy polyps data), the pipeline begins with file-type detection by the Ingestion Identifier Agent, followed by the Data Anonymizer Agent ensuring privacy compliance, where we first identify the data type and then anonymize it. The Feature Extraction Agent identifies features using an embedding-based approach for tabular data, extracting all column names, and a multi-stage MedGemma-based approach for image data, which infers modality and disease name. These features guide the Model-Data Feature Matcher Agent in selecting the best-fit model from a curated repository. The Preprocessing Recommender Agent and Preprocessing Implementor Agent then apply tailored preprocessing based on data type and model requirements. Finally, the ``Model Inference Agent" runs the selected model on the uploaded data and generates interpretable outputs using tools like SHAP, LIME, and DETR attention maps. By automating these high-friction stages of the ML lifecycle, the proposed framework reduces the need for repeated expert intervention, offering a scalable, cost-efficient pathway for operationalizing AI in clinical environments.

Collections

Summary

The paper introduces an Agentic AI framework that automates medical data pipelines from ingestion and anonymization to model inference.
It features specialized agents for tasks including data feature extraction and autonomous model selection using cosine similarity and SapBERT embeddings.
The framework enhances operational efficiency, reduces manual preprocessing, and ensures privacy compliance across both structured and unstructured data.

Agentic AI Framework for End-to-End Medical Data Inference

The paper, "Agentic AI framework for End-to-End Medical Data Inference" (2507.18115), proposes an innovative Agentic AI framework to automate the clinical data pipeline from ingestion to model inference. This essay will explore the framework's architecture, highlight how it addresses current healthcare AI challenges, and discuss its implications for medical data interpretation.

Framework Architecture

The Agentic AI framework is designed to streamline medical data processing by utilizing a collection of modular, task-specific agents. Each agent specializes in tasks such as data ingestion, anonymization, feature extraction, feature matching, preprocessing, and model inference. The underlying goal is to address the limitations in current clinical pipelines by automating labor-intensive tasks and ensuring privacy compliance.

Figure 1: Complete architecture.

The framework employs several key agents:

Ingestion Identifier Agent: Classifies file types to direct the appropriate downstream processing path, ensuring accurate data handling.
Data Anonymizer Agent: Ensures privacy compliance by redacting personally identifiable information using advanced detection algorithms.
Feature Extraction Agent: Performs semantic feature extraction tailored to the data modality, crucial for accurate downstream analysis.

This structured approach significantly reduces manual preprocessing efforts and aligns with legal privacy mandates.

Feature Matching and Model Selection

The framework's strength lies in its autonomous feature matching and model selection process, addressing crucial challenges in model-data alignment. The Model-Data Feature Matcher Agent evaluates datasets against a repository of pretrained models using cosine similarity measures, ensuring semantic alignment between data features and model requirements. This process is illustrated through embedding-based comparisons using SapBERT, making the model selection robust to minor variations in feature naming conventions.

Figure 2: Ingestion Selector framework.

Figure 3: Ingestion Feature Matcher framework.

Through this approach, the system dynamically selects the most appropriate machine learning models, minimizing mismatches and enhancing prediction reliability.

Preprocessing and Model Inference

Preprocessing is tailored to the input data's characteristics and the model's requirements, facilitated by the Preprocessing Recommender Agent. This agent classifies data types and recommends specific preprocessing operations, such as scaling and normalization for tabular data or specific image pre-processing tailored to the model's native training conditions.

Figure 4: Preprocessing framework.

Once preprocessing is complete, the Model Inference Agent applies sophisticated machine learning models to generate interpretable results, leveraging tools like SHAP, LIME, and DETR attention maps for enhanced interpretability.

Performance and Practical Implications

The framework's evaluation on datasets from geriatrics, palliative care, and colonoscopy imaging showcases its robustness. The modular agents demonstrated effective handling of both structured (e.g., anxiety data) and unstructured data (e.g., colonoscopy images), affirming its versatility across diverse medical applications.

The autonomous nature of the Agentic AI framework reduces operational costs and minimizes the need for expert intervention in the initial stages of data processing, thus offering a scalable solution that aligns with cost-efficient healthcare delivery. Furthermore, its architecture ensures that regulatory compliance is an integral aspect, enhancing trust in AI-driven clinical environments.

Ethical Considerations and Future Directions

Despite its thorough design, the framework raises key ethical concerns regarding data sovereignty and autonomous decision-making accountability. The reliance on cloud-based services for data anonymization requires careful consideration of regional data protection regulations, advocating for eventual support for edge-based anonymization techniques.

Looking forward, enhancements to feature extraction algorithms and incorporating adaptive learning mechanisms in preprocessing recommendations could improve the system's adaptability to evolving datasets and dynamic clinical contexts.

Figure 5: An example of the complete workflow for both image/tabular data.

Conclusion

This paper introduces a comprehensive framework to address the fragmentation in medical data processing workflows. The use of specialized agents for data handling ensures efficient, flexible, and privacy-compliant data pipelines. By leveraging an Agentic AI architecture, the framework stands as a promising advancement for operationalizing AI in clinical settings, provided it continues to evolve in parallel with emerging data governance standards and ethical considerations.