Papers
Topics
Authors
Recent
2000 character limit reached

KYU Agent: Adaptive Compliance in DPDP

Updated 10 January 2026
  • KYU Agent is a software component that models user trustworthiness using supervised classification and multi-agent orchestration for DPDP compliance.
  • It employs a random forest classifier, trained on synthesized user-feature data, achieving 98% accuracy in trust score estimation.
  • The agent integrates domain-specific anonymization strategies via a policy engine to adjust data access based on computed trust and sensitivity.

The KYU (Know-Your-User) Agent is a software component within an agentic framework for data governance under the Indian Digital Personal Data Protection (DPDP) Act. The KYU Agent is specifically designed for user trustworthiness modeling and semantic understanding to enable explainable, adaptive compliance in data-driven systems. Through a formally defined trust-scoring model, integration with a Compliance Agent, and participation in a goal-driven, multi-agent decision pipeline, the KYU Agent supports compliance with DPDP by informing access strategies—ranging from raw data sharing to domain-aware anonymization—across domains such as healthcare, education, and e-commerce (Kulkarni et al., 3 Jan 2026).

1. Formal Specification

The KYU Agent operates on a mathematical foundation anchored in supervised classification and trust-level discretization. For each requesting user uu, a feature vector xu=[x1,x2]Tx_u = [x_1, x_2]^T is extracted, where:

  • x1{0,1}x_1 \in \{0,1\} encodes email-domain trust (0: personal, 1: organizational)
  • x2{0,12,1}x_2 \in \{0, \frac{1}{2}, 1\} encodes purpose trust (0: external, 12\frac{1}{2}: self-use, 1: organizational)

The agent is defined as a tuple:

KYU=(X,Θ,T,τ)\text{KYU} = (X, \Theta, T, \tau)

where:

  • XR2X \subset \mathbb{R}^2 is the user-feature space,
  • Θ\Theta are parameters of a random-forest classifier RF(;Θ)\text{RF}(\cdot;\Theta),
  • T:X[0,1]T: X \to [0,1] is the trust-score function T(u)=RF(xu;Θ)T(u) = \text{RF}(x_u; \Theta),
  • τ={τ1,τ2}\tau = \{\tau_1, \tau_2\} with 0<τ1<τ2<10<\tau_1<\tau_2<1 defines the trust label cutoffs.

Trust labels are partitioned as:

  • “low” if T(u)τ1T(u) \leq \tau_1,
  • “moderate” if τ1<T(u)τ2\tau_1 < T(u) \leq \tau_2,
  • “high” if T(u)>τ2T(u) > \tau_2.

Model parameters were set as nestimators=100n_\text{estimators}=100 trees, maximum depth D=10D=10, trained with 5-fold cross-validation on \sim5000 synthesized samples, achieving 98% accuracy and F10.96F_1 \approx 0.96. Thresholds were empirically set to τ1=0.33\tau_1=0.33, τ2=0.66\tau_2=0.66 (Kulkarni et al., 3 Jan 2026).

2. Multi-Layer Architecture and Workflow

The KYU Agent is situated within a four-layer, multi-agent pipeline, described as follows:

  • Perception Layer:
    • Metadata Extractor processes CSV schemas into a repository containing owner, domains, and attribute types.
    • Compliance Pipeline segments DPDP Act legal text and applies NER (fine-tuned BERT-CRF) for rule tuple extraction.
    • Clustering Pipeline assigns ~60 domain documents using TF-IDF, K-means (k=10k=10), and LDA (ntopics=20n_\text{topics}=20) for domain/topic modeling.
  • Reasoning Layer:
    • Request Interpreter parses (userProfile,intent,attributes,purpose)(\text{userProfile}, \text{intent}, \text{attributes}, \text{purpose}) from incoming API calls.
    • Data Mapper tags requested resources by domain and owner.
    • KYU Agent applies T(u)=RF(xu;Θ)T(u) = \text{RF}(x_u; \Theta) to compute user trust.
    • Compliance Agent assigns a data sensitivity score S{low,moderate,high}S \in \{\text{low}, \text{moderate}, \text{high}\} using LLM (LLaMA with RAG) and human verification.
  • Orchestration Layer:
    • Policy Engine implements conditional strategies:
    • “raw-share” if KYUlevel=\text{KYU}_\text{level} = high and S=S = low,
    • “full-anonymize” if KYUlevel=\text{KYU}_\text{level} = low and S=S = high,
    • “partial-mask” or “generalize” otherwise.
  • Action Layer:
    • Retrieval Engine issues data queries.
    • Compliance Engine applies transformation strategies (masking, pseudonymization, generalization) with legal justifications.

A finite-state machine formalizes workflow transitions:

WAIT_REQUESTCOMPUTE_KYUCOMPUTE_SENSITIVITYORCHESTRATEEXECUTEDONE\text{WAIT\_REQUEST} \rightarrow \text{COMPUTE\_KYU} \rightarrow \text{COMPUTE\_SENSITIVITY} \rightarrow \text{ORCHESTRATE} \rightarrow \text{EXECUTE} \rightarrow \text{DONE}

With inter-agent messages structured as (LaTeX notation):

  • toKYU:{userProfile}\langle \texttt{toKYU} : \{\text{userProfile}\} \rangle
  • fromKYU:{trustScore}\langle \texttt{fromKYU} : \{\text{trustScore}\} \rangle
  • further exchanges with Compliance Agent and Action layers (Kulkarni et al., 3 Jan 2026).

3. Semantic Understanding and Trust Modeling

The KYU Agent leverages a suite of NLP and ML algorithms for legal text processing and user trust inference:

  • NER maps legal text tokens to entity labels, producing 300-dimensional Word2Vec embeddings for sections.
  • Clustering via K-means (O(Nkditerations)O(N \cdot k \cdot d \cdot \text{iterations})) and topic modeling via LDA are employed for domain-specific knowledge extraction.
  • Trustworthiness is computed from xux_u using a random forest classifier trained to minimize cross-entropy loss:

L(Θ)=1Micyi,clogp^i,cL(\Theta) = -\frac{1}{M} \sum_i \sum_c y_{i,c} \log \hat{p}_{i,c}

with Gini-based node splits.

At inference, classifier outputs probabilities p^\hat{p}, and the scalar trust score is computed as:

T(u)=p^(high)×1+p^(moderate)×0.5+p^(low)×0T(u) = \hat{p}(\text{high}) \times 1 + \hat{p}(\text{moderate}) \times 0.5 + \hat{p}(\text{low}) \times 0

and normalized to [0,1][0,1].

4. Core Algorithms and Interaction Protocols

Three primary routines with time complexity annotations are specified:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def AssessTrustworthiness(userProfile):
    x_u = ExtractFeatures(userProfile)
    T = RF_Infer(x_u)  # O(100 * log 5000)
    return T

def GenerateSemanticRepresentation(textBlock):
    segments = SectionSegment(textBlock)      # O(|text|)
    ents = NER_Model(segments)                # O(|text| * d)
    tuples = RuleExtractor(ents)              # O(#ents * ruleCost)
    return tuples

def DecideAccess(T, S):
    if T > tau_2 and S == "low":    return "RAW"
    if T <= tau_1 and S == "high":  return "FULL_ANON"
    else:                           return "PARTIAL_MASK"

The KYU Agent and Compliance Agent interact via channel-labeled message transitions. The process begins with user feature extraction and trust computation, proceeds to data sensitivity assessment by the Compliance Agent, and culminates in a policy-based access strategy communicated to action modules (Kulkarni et al., 3 Jan 2026).

5. Domain-Specific Case Studies

Empirical validation is reported across ten domains. Selected examples:

Education (Child):

  • Request: [email protected], Purpose=self-use, Attributes = {studentID, Age_Years, SchoolType}
  • xu=(1,0.5)x_u = (1, 0.5) yields T=0.75T=0.75 (“high”)
  • Compliance Agent: S = “moderate”
  • Applied strategy: PARTIAL_MASK
  • Transformation: studentID masked, Age_Years generalized, SchoolType partially masked
  • Domain D-metric (AnonymizationScore): 0.54

E-commerce:

  • Request: [email protected], Purpose=external, Attributes = {purchaseHistory, creditCard}
  • xu=(0,0)x_u = (0, 0) yields T=0.12T=0.12 (“low”)
  • Compliance Agent: S = “high”
  • Applied strategy: FULL_ANON
  • Transformation: creditCard encrypted, purchaseHistory aggregated
  • AnonymizationScore: 0.63

A full table of mean Anonymisation Scores:

Domain Anonymisation Score
E-commerce 0.63
SocialMedia 0.63
Telecom 0.63
Healthcare 0.62
Education 0.54
Finance 0.49
Startups 0.44
Travel 0.40
Employment 0.37
Government 0.35

6. Hyperparameters, Evaluation Metrics, and Latency

  • Random Forest: nestimators=100n_\text{estimators}=100, max_depth=10, 5-fold cross-validation, accuracy=0.98, F1=0.96F_1=0.96.
  • Clustering: K-means, k=10k=10, max iterations=300.
  • LDA topic modeling: ntopics=20n_\text{topics}=20.
  • Embeddings: Word2Vec, dimension=300.
  • Latency: End-to-end request latency \approx 120 ms (mean over 1,000 requests).

The cumulative results indicate effective, domain-specific compliance and anonymization strategies aligned with DPDP requirements (Kulkarni et al., 3 Jan 2026).

7. Significance within Data Governance Frameworks

The KYU Agent’s formalized approach to user trustworthiness and semantic legal understanding addresses critical limitations in traditional compliance tools: static rule encoding, lack of transparency, and inflexibility to policy evolution. By integrating learning-based user modeling and agentic interaction protocols, the framework enhances transparency, adaptability, and compliance traceability central to responsible AI data governance. Its deployment demonstrates scalable, transparent, and regulation-aware data policy orchestration in cross-domain applications under the Indian DPDP Act (Kulkarni et al., 3 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to KYU Agent.