Jetson Nano-R: Embedded EHR Retrieval
- Jetson Nano-R is a specialized embedded device that processes and retrieves clinical EHR data offline, ensuring data privacy and regulatory compliance.
- It segments unstructured electronic health records into coherent sections using heuristic rules and encodes them with the BGE-M3 model for fast similarity search.
- The system integrates with dedicated summarization modules to reduce latency and efficiently support emergency clinical decision-making workflows.
Jetson Nano-R (Retrieve) is designated as a specialized embedded retrieval device in dual-stage, privacy-preserving summarization architectures for electronic health records (EHRs). Its primary function is to operate as a local retrieval engine that processes and distills relevant information from lengthy, unstructured clinical documents in a fully offline manner, supporting subsequent downstream summarization and decision-making workflows for emergency physicians. In this architecture, Jetson Nano-R is explicitly separated from the summarization device (Nano-S/Summarize) to maximize computational throughput and minimize overall latency, while rigorously maintaining data privacy by ensuring that all patient data and intermediate representations remain on-premises.
1. System Role and Hardware Basis
Jetson Nano-R is implemented on an embedded NVIDIA Jetson Orin Nano platform equipped with an 8 GB GPU and a CUDA-enabled ARM processor. The system is engineered to host and operate all data ingestion, parsing, indexing, and retrieval tasks for clinical EHRs. All patient data resides locally on the device, and the architecture does not require any external cloud, remote API, or internet access. This design ensures that all retrieval, similarity search, and data pre-processing are performed under strict privacy constraints, facilitating compliance with regulatory requirements in healthcare deployments (Wu et al., 5 Oct 2025).
2. Record Segmentation and Section Extraction
The retrieval stage commences by parsing the stored EHRs, which typically comprise long, heterogeneous clinical narratives containing various data types and section formats. Jetson Nano-R scans the record for known section headers (e.g., “History,” “Medication,” “Lab results”) and applies heuristic segmentation rules such as splitting the text at double-newline delimiters or fixed token-count thresholds in the absence of clear paragraph boundaries. Each chunk or section resulting from this segmentation is intended to be a semantically coherent unit of clinical relevance (Wu et al., 5 Oct 2025).
The segmentation process is crucial: by converting an unstructured record into logically delimited sections, the device enables fine-grained matching between the physician’s information need (expressed as a query) and discrete EHR content. This ensures that only the most directly relevant data is sent on for summarization.
3. Embedding Generation and Local Vector Index
Every segmented section is independently encoded into a vector embedding using the BGE-M3 embedding model. This model is executed entirely on the Jetson Nano-R hardware. The collection of these vector representations—each corresponding to a distinct document section—forms a local dense vector index. The vector index is implemented using FAISS, an efficient similarity search library that supports fast, scalable nearest-neighbor queries in high-dimensional vector spaces (Wu et al., 5 Oct 2025).
Mathematically, let the EHR document yield sections, each with embedding for . The index stores for similarity queries. This design allows real-time, on-device retrieval against potentially large EHRs without exceeding the device’s compute or memory budgets.
4. Query-Driven Retrieval via Similarity Search
Upon receiving a user query (typically from an emergency physician’s interface, e.g., “chest pain”), Jetson Nano-R first encodes the query text into a vector using the same BGE-M3 model: . The FAISS engine then executes a similarity search—usually via cosine similarity or inner product—between the query vector and all section embeddings:
where is the similarity score for section . The system retrieves the top- sections with the highest similarity scores. Typically, in reported benchmarks (Wu et al., 5 Oct 2025), but this is configurable.
A summary of the retrieval steps:
Step | Operation | Output |
---|---|---|
1. Sectioning | Split EHR into semantically coherent chunks | Sections |
2. Embedding | Encode each section with BGE-M3 embedding model | |
3. Indexing | Store embeddings in a FAISS vector index | Vector index for search |
4. Query embedding | Encode query to | Query embedding |
5. Similarity search | Retrieve top- sections by highest | Top- relevant sections |
This approach reduces the total candidate text to be summarized by as much as 90–99% compared to processing full records, enabling strict memory and time constraints on the downstream summarization device.
5. Communication and Integration with Summarization Stage
After retrieving the most relevant sections, Jetson Nano-R packages and transmits this minimal subset of EHR content to the Jetson Nano-S device. The devices are linked via a lightweight socket connection (TCP/IP), which is designed to minimize network overhead within closed clinical networks. The architecture allows summarization models on Nano-S to operate with a reduced memory footprint by only loading the most pertinent information, enabling larger model sizes or batching without exceeding RAM constraints (Wu et al., 5 Oct 2025).
Empirically, ablation experiments in the referenced work show that this dual-device separation almost halves end-to-end summarization latency compared to single-device (joint retrieval and summarization) configurations. This division also allows the Nano-S device to keep a large model “hot loaded” in GPU memory, whereas the Nano-R device remains stateless with respect to model weights and focuses purely on vector search and data packaging.
6. Technical and Clinical Impact
The Jetson Nano-R’s contribution is significant for both system engineering and clinical workflow:
- Privacy and Security: Because all patient data and vector indices remain on local hardware, the risk of information leakage is minimized and regulatory compliance is simplified.
- Efficiency: By restricting summarization to only the top-k most relevant sections, physician-facing summaries are produced more quickly and with lower hardware overhead.
- Clinician Experience: Emergency physicians receive concise, critical summaries—computed within 30 seconds and consisting of a structured finding list and narrative focused on their specific query.
- Extensibility: The retrieval approach can be adapted to variable clinical note lengths, EHR formats, and query templates, without retraining the downstream SLM.
7. Evaluation Methods and Formulas
Assessment of retrieval efficacy focuses on factual accuracy and relevance in downstream summaries. While the Jetson Nano-R is not directly evaluated for summary output, the work presents formulas for factual accuracy in summary evaluation, where
with and weighted aggregation
While these formulas apply to summary outputs, the underlying implication is that by narrowing the context to the most relevant, system-identified EHR sections on Jetson Nano-R, risks of factual hallucination and irrelevance in clinical summarization are minimized (Wu et al., 5 Oct 2025).
In summary, Jetson Nano-R (Retrieve) is a dedicated local device for embedding-based retrieval in dual-stage EHR summarization systems. Its segmented processing, vector indexing, and fast query-driven section selection—executed entirely offline and on-premises—enable confidential, timely, and efficient summarization of large-scale clinical records for emergency medicine workflows. The approach offers a robust framework for privacy-preserving, real-time clinical decision support in resource-constrained environments.