KYU Agent: Adaptive Compliance in DPDP
- KYU Agent is a software component that models user trustworthiness using supervised classification and multi-agent orchestration for DPDP compliance.
- It employs a random forest classifier, trained on synthesized user-feature data, achieving 98% accuracy in trust score estimation.
- The agent integrates domain-specific anonymization strategies via a policy engine to adjust data access based on computed trust and sensitivity.
The KYU (Know-Your-User) Agent is a software component within an agentic framework for data governance under the Indian Digital Personal Data Protection (DPDP) Act. The KYU Agent is specifically designed for user trustworthiness modeling and semantic understanding to enable explainable, adaptive compliance in data-driven systems. Through a formally defined trust-scoring model, integration with a Compliance Agent, and participation in a goal-driven, multi-agent decision pipeline, the KYU Agent supports compliance with DPDP by informing access strategies—ranging from raw data sharing to domain-aware anonymization—across domains such as healthcare, education, and e-commerce (Kulkarni et al., 3 Jan 2026).
1. Formal Specification
The KYU Agent operates on a mathematical foundation anchored in supervised classification and trust-level discretization. For each requesting user , a feature vector is extracted, where:
- encodes email-domain trust (0: personal, 1: organizational)
- encodes purpose trust (0: external, : self-use, 1: organizational)
The agent is defined as a tuple:
where:
- is the user-feature space,
- are parameters of a random-forest classifier ,
- is the trust-score function ,
- with defines the trust label cutoffs.
Trust labels are partitioned as:
- “low” if ,
- “moderate” if ,
- “high” if .
Model parameters were set as trees, maximum depth , trained with 5-fold cross-validation on 5000 synthesized samples, achieving 98% accuracy and . Thresholds were empirically set to , (Kulkarni et al., 3 Jan 2026).
2. Multi-Layer Architecture and Workflow
The KYU Agent is situated within a four-layer, multi-agent pipeline, described as follows:
- Perception Layer:
- Metadata Extractor processes CSV schemas into a repository containing owner, domains, and attribute types.
- Compliance Pipeline segments DPDP Act legal text and applies NER (fine-tuned BERT-CRF) for rule tuple extraction.
- Clustering Pipeline assigns ~60 domain documents using TF-IDF, K-means (), and LDA () for domain/topic modeling.
- Reasoning Layer:
- Orchestration Layer:
- Policy Engine implements conditional strategies:
- “raw-share” if high and low,
- “full-anonymize” if low and high,
- “partial-mask” or “generalize” otherwise.
- Action Layer:
- Retrieval Engine issues data queries.
- Compliance Engine applies transformation strategies (masking, pseudonymization, generalization) with legal justifications.
A finite-state machine formalizes workflow transitions:
With inter-agent messages structured as (LaTeX notation):
- further exchanges with Compliance Agent and Action layers (Kulkarni et al., 3 Jan 2026).
3. Semantic Understanding and Trust Modeling
The KYU Agent leverages a suite of NLP and ML algorithms for legal text processing and user trust inference:
- NER maps legal text tokens to entity labels, producing 300-dimensional Word2Vec embeddings for sections.
- Clustering via K-means () and topic modeling via LDA are employed for domain-specific knowledge extraction.
- Trustworthiness is computed from using a random forest classifier trained to minimize cross-entropy loss:
with Gini-based node splits.
At inference, classifier outputs probabilities , and the scalar trust score is computed as:
and normalized to .
4. Core Algorithms and Interaction Protocols
Three primary routines with time complexity annotations are specified:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
def AssessTrustworthiness(userProfile): x_u = ExtractFeatures(userProfile) T = RF_Infer(x_u) # O(100 * log 5000) return T def GenerateSemanticRepresentation(textBlock): segments = SectionSegment(textBlock) # O(|text|) ents = NER_Model(segments) # O(|text| * d) tuples = RuleExtractor(ents) # O(#ents * ruleCost) return tuples def DecideAccess(T, S): if T > tau_2 and S == "low": return "RAW" if T <= tau_1 and S == "high": return "FULL_ANON" else: return "PARTIAL_MASK" |
The KYU Agent and Compliance Agent interact via channel-labeled message transitions. The process begins with user feature extraction and trust computation, proceeds to data sensitivity assessment by the Compliance Agent, and culminates in a policy-based access strategy communicated to action modules (Kulkarni et al., 3 Jan 2026).
5. Domain-Specific Case Studies
Empirical validation is reported across ten domains. Selected examples:
Education (Child):
- Request: [email protected], Purpose=self-use, Attributes = {studentID, Age_Years, SchoolType}
- yields (“high”)
- Compliance Agent: S = “moderate”
- Applied strategy: PARTIAL_MASK
- Transformation: studentID masked, Age_Years generalized, SchoolType partially masked
- Domain D-metric (AnonymizationScore): 0.54
E-commerce:
- Request: [email protected], Purpose=external, Attributes = {purchaseHistory, creditCard}
- yields (“low”)
- Compliance Agent: S = “high”
- Applied strategy: FULL_ANON
- Transformation: creditCard encrypted, purchaseHistory aggregated
- AnonymizationScore: 0.63
A full table of mean Anonymisation Scores:
| Domain | Anonymisation Score |
|---|---|
| E-commerce | 0.63 |
| SocialMedia | 0.63 |
| Telecom | 0.63 |
| Healthcare | 0.62 |
| Education | 0.54 |
| Finance | 0.49 |
| Startups | 0.44 |
| Travel | 0.40 |
| Employment | 0.37 |
| Government | 0.35 |
6. Hyperparameters, Evaluation Metrics, and Latency
- Random Forest: , max_depth=10, 5-fold cross-validation, accuracy=0.98, .
- Clustering: K-means, , max iterations=300.
- LDA topic modeling: .
- Embeddings: Word2Vec, dimension=300.
- Latency: End-to-end request latency 120 ms (mean over 1,000 requests).
The cumulative results indicate effective, domain-specific compliance and anonymization strategies aligned with DPDP requirements (Kulkarni et al., 3 Jan 2026).
7. Significance within Data Governance Frameworks
The KYU Agent’s formalized approach to user trustworthiness and semantic legal understanding addresses critical limitations in traditional compliance tools: static rule encoding, lack of transparency, and inflexibility to policy evolution. By integrating learning-based user modeling and agentic interaction protocols, the framework enhances transparency, adaptability, and compliance traceability central to responsible AI data governance. Its deployment demonstrates scalable, transparent, and regulation-aware data policy orchestration in cross-domain applications under the Indian DPDP Act (Kulkarni et al., 3 Jan 2026).