Papers
Topics
Authors
Recent
Search
2000 character limit reached

AI PB: Generative Investment Insights

Updated 25 October 2025
  • AI PB is a production-scale generative agent that delivers personalized investment insights by integrating multimodal data retrieval and rigorous compliance with financial regulations.
  • The system employs a hybrid evidence retrieval pipeline that reduces hallucination rates by over 30%, ensuring outputs are strictly grounded in enterprise data.
  • Its multi-stage recommendation engine and on-premises deployment enhance engagement and operational reliability while meeting stringent national financial standards.

AI PB denotes a production-scale generative agent for personalized investment insights in retail finance, as introduced in "AI PB: A Grounded Generative Agent for Personalized Investment Insights" (Park et al., 23 Oct 2025). Unlike reactive chatbots, AI PB proactively supplies grounded, compliant, and personalized insights using a multimodal architecture integrating strict data governance, layered recommender logic, and scalable on-premises deployment. The system is designed for robust operation under stringent national financial regulations, particularly emphasizing factual grounding and user-centric recommendation in high-stakes environments.

1. System Architecture and Orchestration

AI PB is constructed around a component-based orchestration layer. Each component defines a workflow (such as portfolio analysis or disclosure summary) realized via a determined sequence of modular steps including data retrieval, transformation, summarization, and compliance checks. The orchestrator deterministically routes components to internal or external LLMs based on assessed data sensitivity. For requests involving personally identifiable information (PII), execution proceeds exclusively on internal models to enforce compliance; non-sensitive requests may utilize external LLMs (e.g., GPT-4o) for enhanced stylistic expressiveness. This explicit routing mechanism supports granular auditing, traceability, and regulatory transparency in all model invocations.

2. Evidence-Retrieval and Grounded Generation

The system’s outputs are strictly anchored in enterprise data by a hybrid retrieval pipeline. Initial candidate document selection is accomplished through OpenSearch (symbolic keyword-based sparse retrieval). These candidates are subsequently reranked using the NMIXX finance-domain embedding model, which applies dense retrieval techniques specialized for financial semantics. Extracted passages are placed into structured evidence templates, which are then fed into the generative model. The templates are validated for the correct presence of reference tokens. This pipeline decreases hallucination rates by over 30% relative to naïve prompting.

3. Multi-Stage Recommendation Engine

AI PB deploys a three-layered recommendation architecture to deliver personalized insights and maximize engagement:

  • The first (rule-based) layer enforces deterministic business logic, prioritizing relevance to the user’s portfolio, suppressing repeated content, and imposing time-based freshness constraints.
  • The second layer implements sequential recommendation using models that account for temporal dependencies in user content consumption, such as transitions from broad market summaries to firm-specific analyses.
  • The third layer utilizes a contextual multi-armed bandit approach, adjusting rankings in real time based on behavioral cues including clicks and dwell time, thereby balancing exploration and exploitation for diversified feed content.

Daily, 22 insight types per user are pre-generated, ranked, and delivered to the front-end dashboard, enhancing the topical relevance and novelty of the investment guidance.

4. On-Premises Deployment and Infrastructure

The entire system is deployed on-premises at Shinhan Securities, meeting strict Korean financial privacy and data sovereignty regulations. Microservices are orchestrated with Docker Swarm for scalable, failure-tolerant operation through an overlay network. Inference is distributed across 24 NVIDIA H100 GPUs. Eight GPUs are allocated for guard, embedding, and reranking computations. The remaining sixteen GPUs handle generative model workloads using a Qwen3-32B LLM, aligned via LoRA+ORPO, and efficiently served through vLLM—which provides key–value cache optimization and continuous batching.

5. Compliance, Safety, and Quality Control

AI PB mandates that all generated content is verifiable against validated enterprise data using reference tokens. Sensitive workflow segmentation, coupled with explicit routing, provides robust guarantees that PII never leaves the regulated perimeter. The output of each generative step is filtered through Shinhan-Guard—a domain-adapted, fine-tuned derivative of Llama Guard 3—covering toxicity, prompt injection, and PII benchmarks. Shinhan-Guard attains F1 scores ≥ 0.9 across key safety metrics. Guard rejection rates average 1.8%. System p95 latency remains <13.9 seconds for interactive requests, and pre-generation averages 5.9 seconds. A custom component classification metric quantifies routing accuracy: with balanced weighting (α = β = 0.5), the average routing metric score is 510.24, reflecting both completeness and precision in orchestration.

6. Evaluation and Practical Impact

System efficacy is validated through expert human QA (n > 300, assessed by professional reviewers) across axes of factuality (~91.2%), safety (~98.4%), and alignment (~85.7%). In production, AI PB’s “Today Feed” dashboard delivers disclosure digests, sector narratives, and personalized diagnostics, while the “Dialogue View” supports multi-turn interactive analysis. Empirical metrics show an 18% increase in daily feed engagement and a 23% reduction in content repetition compared to rule-based baselines, underscoring user value in tailored and proactive insights.

7. Significance and Broader Applications

AI PB exemplifies a rigorously engineered framework for trustworthy AI in regulated finance, integrating:

  • Deterministic, auditable routing for regulatory compliance
  • Hybrid symbolic–dense retrieval for factual grounding and hallucination suppression
  • Multi-layered personalized recommendation balancing business logic and user behavior
  • Containerized, on-premises GPU acceleration for scalable, efficient deployment
  • Layered safety mechanisms via adapted domain guard models

The system sets a standard for deploying reliable, explainable, and compliant generative AI in high-stakes domains. A plausible implication is that this architecture serves as a template for secure, modularized AI deployments across similarly regulated industries where privacy, auditability, and factual veracity are mission-critical (Park et al., 23 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AI PB.