Vault Credit Score Overview

Updated 4 July 2026

Vault Credit Score is a risk metric that maps state variables to default probability and loss estimates across consumer and DeFi settings.
Methodologies include binary classification with logit transformations, quantile mapping, and divergence-optimized scorecards, emphasizing model flexibility.
The score construction integrates protocol-specific features, exposure metrics, and governance factors to enable effective threshold-based risk management.

Vault Credit Score refers, across the cited literature, to several related but non-identical credit-risk constructs rather than a single standardized metric. In consumer-credit settings it can denote a binary estimate of default risk for a cardholder; in Aave-specific decentralized-finance settings it can denote the probability that a position becomes delinquent through a Health Factor breach; in wallet-level on-chain risk models it is a probability-like estimate of liquidation or adverse credit outcomes over a horizon; and in ERC‑4626 vault analysis it can denote a composite score aggregating mechanical loss channels, governance quality, code integrity, concentration, liquidity, and transparency. The common structure is a mapping from state variables to a risk measure used for ranking, threshold-based decisioning, score transformation, capital allocation, or governance monitoring (Arram et al., 2023, Wolf et al., 2022, Ghosh et al., 2024, Zbandut et al., 19 Apr 2026, Zbandut et al., 12 Dec 2025).

1. Problem definitions and target variables

In the credit-card setting, the target is explicitly the probability that a cardholder will default, with binary label $\mathrm{Target}\in\{0,1\}$ where $1=\mathrm{default}$ and $0=\mathrm{non\text{-}default}$ . The task is binary classification producing a risk score $\hat y=P(y=1\mid x)$ suitable for ranking, threshold-based decisions, and scorecard transformation. The cited study does not describe a proprietary “Vault” score; rather, it introduces a new credit card default dataset and a model-comparison pipeline that can underpin or benchmark a Vault-branded credit card risk score (Arram et al., 2023).

In Aave v2, the target is not conventional loan default but “position delinquency,” defined through the protocol’s Health Factor. With collateral values $\mathrm{Collateral}_i^{ETH}$ , liquidation thresholds $LT_i$ , and total borrows $\mathrm{Total\ Borrows}^{ETH}$ , the Health Factor is

$HF=\frac{\sum_i \mathrm{Collateral}_i^{ETH}\times LT_i}{\mathrm{Total\ Borrows}^{ETH}}.$

The binary label is defined over the first 90 days after a position is opened:

$y=\max\big(\{HF_{t\mid P_i}<1:t\mid P_i\le 90\}\big)\in\{0,1\}.$

Thus $y=1$ means that the position ever becomes eligible for liquidation within 90 days. A recurrent misconception is that this target measures realized liquidation transactions; the paper instead models eligibility for liquidation, conditional on $1=\mathrm{default}$ 0 (Wolf et al., 2022).

In the wallet-level DeFi setting, the On-Chain Credit Risk Score is a probability in $1=\mathrm{default}$ 1 interpreted as a wallet’s likelihood of default or liquidation within a chosen horizon $1=\mathrm{default}$ 2:

$1=\mathrm{default}$ 3

This is framed as a Probability of Default proxy that can be combined with Loss Given Default and Exposure At Default through

$1=\mathrm{default}$ 4

The same work proposes exposure-weighted aggregation from wallet-level risk to vault-level risk (Ghosh et al., 2024).

In the depositor-centric ERC‑4626 vault formulation, a DeFi lending vault is treated as a credit instrument pooling depositor liabilities $1=\mathrm{default}$ 5 and allocating them to collateralized loans. Depositor loss arises when realized liquidation-value assets $1=\mathrm{default}$ 6 are insufficient to meet book claims at redemption. The shortfall and normalized loss rate are

$1=\mathrm{default}$ 7

This formalization moves the target from borrower delinquency to depositor loss under protocol-specific execution frictions (Zbandut et al., 19 Apr 2026).

A further extension appears in modular decentralized credit, where the score is attached not only to a borrower or vault but also to the curator layer that determines underwriting and leverage decisions. There the composite score $1=\mathrm{default}$ 8 combines utilization, concentration, leverage, liquidity coverage, tail dependence, expected loss, transparency, and fee alignment in order to compare vaults on a money-market–style basis (Zbandut et al., 12 Dec 2025).

2. Consumer-credit implementations

The most explicit consumer-credit implementation in the cited material uses a new dataset from an American bank containing 500 accounts collected over the most recent 12 months before extraction, with 477 usable rows after quality filtering, monthly granularity, and 36 features total: 12 monthly $1=\mathrm{default}$ 9 variables and 24 anonymized engineered variables derived from payroll histories. The split is 80% train and 20% hold-out test, and ANOVA is used to assess significance across model performances (Arram et al., 2023).

The preprocessing pipeline is operationally specific. Missing values are imputed to $0=\mathrm{non\text{-}default}$ 0; for $0=\mathrm{non\text{-}default}$ 1, “missing” is interpreted as no payroll in that month, and engineered variables derived from payrolls are also set to $0=\mathrm{non\text{-}default}$ 2. Outliers are treated by winsorization at $0=\mathrm{non\text{-}default}$ 3 per feature, clipping values beyond the bounds. The paper states that scaling or normalization is not required by tree-based models, while for neural networks standardization is recommended in production even though it was not required for correctness in the reported experiments. Feature extraction from the 12-month payroll sequence includes windowed aggregates, rolling statistics, trends, recency features, and lagged variables; the bank’s 24 anonymized variables are described as encapsulating some of these transformations (Arram et al., 2023).

Class imbalance is handled explicitly. The study reports that the default class is larger than the non-default class in this dataset, describes this as an atypical but real imbalance for the sample, and evaluates SMOTE, KMeansSMOTE, BorderlineSMOTE, SVMSMOTE, and RandomOverSampler. KMeansSMOTE yields the strongest validation results and is selected, producing an approximately $0=\mathrm{non\text{-}default}$ 4 training distribution while leaving the hold-out test set at natural class proportions (Arram et al., 2023).

The algorithms compared are Logistic Regression, Decision Tree, Random Forest, XGBoost, LightGBM, and Multi-Layer Perceptron. Logistic regression is written as

$0=\mathrm{non\text{-}default}$ 5

with binary cross-entropy loss, while the MLP uses hidden-layer transformations

$0=\mathrm{non\text{-}default}$ 6

and output probability

$0=\mathrm{non\text{-}default}$ 7

For a production-oriented build, the paper recommends tuning regularization, depth, ensemble size, learning-rate, and early-stopping hyperparameters appropriate to each model family (Arram et al., 2023).

Reported results on the hold-out test set identify the MLP as the best recall-oriented model, with $0=\mathrm{non\text{-}default}$ 8, $0=\mathrm{non\text{-}default}$ 9, and $\hat y=P(y=1\mid x)$ 0. Comparative findings report logistic regression at accuracy $\hat y=P(y=1\mid x)$ 1, precision $\hat y=P(y=1\mid x)$ 2, recall $\hat y=P(y=1\mid x)$ 3; decision tree at accuracy $\hat y=P(y=1\mid x)$ 4, AUC $\hat y=P(y=1\mid x)$ 5, precision $\hat y=P(y=1\mid x)$ 6, recall $\hat y=P(y=1\mid x)$ 7; random forest at accuracy $\hat y=P(y=1\mid x)$ 8, precision $\hat y=P(y=1\mid x)$ 9, recall $\mathrm{Collateral}_i^{ETH}$ 0; XGBoost at accuracy $\mathrm{Collateral}_i^{ETH}$ 1, AUC $\mathrm{Collateral}_i^{ETH}$ 2, precision $\mathrm{Collateral}_i^{ETH}$ 3, recall $\mathrm{Collateral}_i^{ETH}$ 4; and LightGBM at accuracy $\mathrm{Collateral}_i^{ETH}$ 5, AUC $\mathrm{Collateral}_i^{ETH}$ 6, precision $\mathrm{Collateral}_i^{ETH}$ 7, recall $\mathrm{Collateral}_i^{ETH}$ 8. The stated explanation for the MLP’s superior recall is its ability to capture nonlinear interactions and temporal patterns embedded in the monthly payroll sequence and engineered variables (Arram et al., 2023).

This consumer-credit line suggests a conventional Vault Credit Score architecture in which the primary statistical object is $\mathrm{Collateral}_i^{ETH}$ 9 and the principal design choice is whether the operating objective is discrimination, recall, calibration, or downstream cost minimization. The cited study emphasizes minimizing false negatives and therefore places threshold policy after model fitting rather than equating the highest accuracy model with the most useful underwriting model (Arram et al., 2023).

3. DeFi account-level and wallet-level scoring

The Aave-specific scoring framework begins from protocol semantics rather than from borrower demographics or repayment histories. A position $LT_i$ 0 is defined as the union of borrowed and collateral asset counters, positions are re-indexed when materially changed over contiguous time, and features are computed at the opening block $LT_i$ 1. The training dataset contains roughly 34,000 rows, built from a Health Factor dataset sampled at 15-minute intervals using historical token prices, aToken reserve specifications, and liquidity index references. Positions with duration less than 10 days are excluded because they largely resemble flash-loan-like, smart-contract arbitrage rather than genuine borrowing behavior (Wolf et al., 2022).

The feature space is described as a static snapshot plus historical aggregations. Examples explicitly mentioned are account age, aggregations of the time series of historical Health Factor up to the opening of the position, interactions with the Aave protocol, and the types of assets borrowed and used as collateral. The evaluation is forward-looking: the most recent 2,500 examples by block timestamp are held out, split into five chunks of 500, and predicted in an incremental out-of-fold scheme. ROC curves and AUC are the reported metrics. The tree-based classifier is described as a strong predictor that outperforms baselines overall, but one baseline is itself highly competitive: $LT_i$ 2 achieves $LT_i$ 3, underscoring the predictive concentration in historical Health Factor dynamics (Wolf et al., 2022).

The final Aave score is not a direct linear transformation of delinquency probability. Instead, predicted probabilities are mapped to integers in $LT_i$ 4 through a quantile transform into a skew-normal target distribution designed to resemble empirical FICO scores. With $LT_i$ 5 denoting the empirical CDF of predicted probabilities and $LT_i$ 6 the fitted skew-normal target CDF, the score is

$LT_i$ 7

The paper also notes a temporary linear mapping for prototyping,

$LT_i$ 8

but identifies the quantile transform as the preferred approach (Wolf et al., 2022).

OCCR broadens the DeFi problem from position-specific Health Factor breach to wallet-level credit risk. Its score is a linear combination of five subscores:

$LT_i$ 9

The components are a historical subscore, a current subscore based on simulated Liquidation-at-Risk, a credit-utilization subscore, an on-chain transaction subscore, and a new-credit subscore. The paper contrasts this with heuristic wallet scores by arguing that the subscores are estimators with derived expectations and variances and that the combined OCCR is asymptotically normal (Ghosh et al., 2024).

The wallet-level formulation is directly tied to policy functions. Dynamic loan-to-value policies may be defined as

$\mathrm{Total\ Borrows}^{ETH}$ 0

or through piecewise tiers in which higher estimated $\mathrm{Total\ Borrows}^{ETH}$ 1 implies tighter $\mathrm{Total\ Borrows}^{ETH}$ 2 and liquidation-threshold settings. This differs from the Aave position-delinquency model in that the score is explicitly integrated with expected-loss budgeting,

$\mathrm{Total\ Borrows}^{ETH}$ 3

and with vault-level aggregation

$\mathrm{Total\ Borrows}^{ETH}$ 4

(Ghosh et al., 2024).

These two DeFi lines share a probabilistic interpretation but differ in ontology. The Aave score is attached to a position and defined by protocol-native solvency state over 90 days; OCCR is attached to a wallet and integrates transaction flow, utilization, simulation-based stress, and historical behavior into a broader PD-like quantity. This suggests that “Vault Credit Score” in DeFi is context-sensitive: it may refer to a score for a position, a wallet, or an aggregated vault exposure, depending on what entity is being underwritten (Wolf et al., 2022, Ghosh et al., 2024).

4. Vault-level depositor-risk and curator-layer scoring

The depositor-centric vault formulation in decentralized finance departs more sharply from TradFi analogies. The cited work argues that six structural features of on-chain execution break canonical TradFi analogies: oracle execution divergence, endogenous recovery, full information run dynamics, timelock-constrained governance, oracle manipulation and latency, and congestion-driven liquidation failure. These are formalized as five tractable Level 1 metrics, denoted $\mathrm{Total\ Borrows}^{ETH}$ 5 through $\mathrm{Total\ Borrows}^{ETH}$ 6, and then aggregated into a Vault Credit Score (Zbandut et al., 19 Apr 2026).

The first metric, stress-adjusted asset coverage, begins from the vault asset coverage ratio

$\mathrm{Total\ Borrows}^{ETH}$ 7

and the collateral-weighted execution deviation

$\mathrm{Total\ Borrows}^{ETH}$ 8

Effective coverage is then

$\mathrm{Total\ Borrows}^{ETH}$ 9

and shortfall can arise even when oracle-marked coverage appears adequate if

$HF=\frac{\sum_i \mathrm{Collateral}_i^{ETH}\times LT_i}{\mathrm{Total\ Borrows}^{ETH}}.$ 0

The second metric, $HF=\frac{\sum_i \mathrm{Collateral}_i^{ETH}\times LT_i}{\mathrm{Total\ Borrows}^{ETH}}.$ 1, is a conditional expected shortfall driven by liquidation mass and market depth; $HF=\frac{\sum_i \mathrm{Collateral}_i^{ETH}\times LT_i}{\mathrm{Total\ Borrows}^{ETH}}.$ 2 is the probability of hitting the utilization boundary $HF=\frac{\sum_i \mathrm{Collateral}_i^{ETH}\times LT_i}{\mathrm{Total\ Borrows}^{ETH}}.$ 3 within a horizon; $HF=\frac{\sum_i \mathrm{Collateral}_i^{ETH}\times LT_i}{\mathrm{Total\ Borrows}^{ETH}}.$ 4 is an oracle integrity score based on expected oracle-induced shortfall; and $HF=\frac{\sum_i \mathrm{Collateral}_i^{ETH}\times LT_i}{\mathrm{Total\ Borrows}^{ETH}}.$ 5 is an execution-viability rate under stress (Zbandut et al., 19 Apr 2026).

Normalization maps each raw metric to $HF=\frac{\sum_i \mathrm{Collateral}_i^{ETH}\times LT_i}{\mathrm{Total\ Borrows}^{ETH}}.$ 6, where higher means lower risk. Two score operators are proposed:

$HF=\frac{\sum_i \mathrm{Collateral}_i^{ETH}\times LT_i}{\mathrm{Total\ Borrows}^{ETH}}.$ 7

The multiplicative form is explicitly conservative, with weakest-link dominance: if any $HF=\frac{\sum_i \mathrm{Collateral}_i^{ETH}\times LT_i}{\mathrm{Total\ Borrows}^{ETH}}.$ 8, then $HF=\frac{\sum_i \mathrm{Collateral}_i^{ETH}\times LT_i}{\mathrm{Total\ Borrows}^{ETH}}.$ 9. The paper recommends letter-grade mapping for the conservative multiplicative score, from A on $y=\max\big(\{HF_{t\mid P_i}<1:t\mid P_i\le 90\}\big)\in\{0,1\}.$ 0 down to D below $y=\max\big(\{HF_{t\mid P_i}<1:t\mid P_i\le 90\}\big)\in\{0,1\}.$ 1 (Zbandut et al., 19 Apr 2026).

The same work introduces Level 2 governance quality and Level 3 smart-contract code integrity as separate layers. Governance metrics include timelock duration relative to a critical response window, quorum, voting-power dispersion, emergency powers, upgradeability, and incident-response history. Code integrity is summarized through node failure probabilities over a horizon and an aggregate failure lower bound

$y=\max\big(\{HF_{t\mid P_i}<1:t\mid P_i\le 90\}\big)\in\{0,1\}.$ 2

with a final depositor view capped by

$y=\max\big(\{HF_{t\mid P_i}<1:t\mid P_i\le 90\}\big)\in\{0,1\}.$ 3

This makes explicit that vault credit quality is not reducible to collateralization ratios alone (Zbandut et al., 19 Apr 2026).

A related but distinct framework scores the curator layer in modular decentralized credit. There, the ERC‑4626 standard is described as separating shared market infrastructure from risk management, while curators determine eligibility lists, collateral caps, haircuts, oracles, rebalancing rules, and cross-chain routing. The proposed score

$y=\max\big(\{HF_{t\mid P_i}<1:t\mid P_i\le 90\}\big)\in\{0,1\}.$ 4

combines utilization, concentration, leverage, liquidity coverage, tail dependence, expected loss, transparency, and fee alignment on a $y=\max\big(\{HF_{t\mid P_i}<1:t\mid P_i\le 90\}\big)\in\{0,1\}.$ 5 scale (Zbandut et al., 12 Dec 2025).

The empirical motivation for curator-level scoring is concentration and co-movement. About $y=\max\big(\{HF_{t\mid P_i}<1:t\mid P_i\le 90\}\big)\in\{0,1\}.$ 6 TVL is managed by eight curators, with Gauntlet at approximately $y=\max\big(\{HF_{t\mid P_i}<1:t\mid P_i\le 90\}\big)\in\{0,1\}.$ 7 (27.6%), Steakhouse at approximately $y=\max\big(\{HF_{t\mid P_i}<1:t\mid P_i\le 90\}\big)\in\{0,1\}.$ 8 (17.8%), MEV Capital at approximately $y=\max\big(\{HF_{t\mid P_i}<1:t\mid P_i\le 90\}\big)\in\{0,1\}.$ 9 (12.6%), and K3 Capital at approximately $y=1$ 0 (6.6%). Drawdown and lower-tail TVL correlations reveal a tightly coupled core, including Block Analitica–Gauntlet $y=1$ 1, B Protocol–Gauntlet $y=1$ 2, and B Protocol–Block Analitica $y=1$ 3, while fee capture varies from roughly 16% for R7 and 14% for Block Analitica to below 3% for Steakhouse and Yearn (Zbandut et al., 12 Dec 2025).

Taken together, these vault-level and curator-level formulations shift the meaning of credit score from borrower-centric default modeling to system-centric loss propagation. The score becomes a compact view of liquidity transformation, execution fragility, governance responsiveness, dependency risk, and concentration externalities rather than only a prediction of whether a single obligor will miss payment (Zbandut et al., 19 Apr 2026, Zbandut et al., 12 Dec 2025).

5. Score construction, calibration, and model forms

Across the cited literature, score construction follows several distinct mathematical routes. In binary consumer-risk models, the basic output is a probability that can be transformed into a scorecard by the logit mapping

$y=1$ 4

with

$y=1$ 5

Threshold selection is then framed through either Youden’s $y=1$ 6 or cost-sensitive minimization of

$y=1$ 7

with $y=1$ 8 as the typical default-risk regime (Arram et al., 2023).

OCCR proposes a different outward presentation. After estimating a PD-like quantity, the score may be normalized to a conventional range $y=1$ 9 via

$1=\mathrm{default}$ 00

or, in a simpler linear variant,

$1=\mathrm{default}$ 01

This preserves monotonicity between default risk and displayed credit score while allowing dynamic LTV and LT policies to operate directly on the underlying PD estimate rather than on the transformed score (Ghosh et al., 2024).

Liquid Scorecards provide a scorecard-theoretic alternative to black-box models. Traditional scorecards are generalized additive models with step functions, whereas Liquid Scorecards replace some step functions with second-, third-, or fourth-order B-spline bases, typically cubic splines. With score vector $1=\mathrm{default}$ 02 over the design matrix, divergence is

$1=\mathrm{default}$ 03

where $1=\mathrm{default}$ 04 and $1=\mathrm{default}$ 05. The optimization is cast as a quadratic program minimizing $1=\mathrm{default}$ 06 subject to $1=\mathrm{default}$ 07 and linear equality and inequality constraints implementing in-weighting, cross restrictions, centering, and monotonicity. In the reported case study, development divergence is $1=\mathrm{default}$ 08, validation divergence is $1=\mathrm{default}$ 09, and the best traditional validation divergence is $1=\mathrm{default}$ 10 (Hoadley, 2020).

A multiclass alternative appears in IQNN-CS, where credit scoring is cast as Low, Average, and High risk classification rather than binary default prediction. The architecture is a hybrid classical–quantum pipeline: standardized numeric features and one-hot encoded categorical variables are reduced by PCA to 6 components; a classical preprocessing block feeds a variational quantum circuit with 6 qubits and 4 StronglyEntanglingLayers; Pauli-Z expectation values are concatenated and passed to a classical postprocessing head producing softmax probabilities over the three classes. Training uses class-weighted cross-entropy consistent with negative log-likelihood, AdamW with learning rate $1=\mathrm{default}$ 11, batch size 16, 50 epochs, and early stopping (Khan et al., 16 Oct 2025).

The reported IQNN-CS performance is bifurcated. On Dataset 1, accuracy reaches 100% and per-class precision, recall, and F1 are all 1.00. On Dataset 2, overall accuracy is 77.3%, with class-wise performance reported as Low: precision 0.64, recall 0.97, F1 0.77; Average: precision 0.73, recall 0.84, F1 0.78; High: precision 0.95, recall 0.67, F1 0.79; macro F1 is approximately 0.78. The paper emphasizes interpretability and feasibility rather than head-to-head superiority over classical baselines (Khan et al., 16 Oct 2025).

These constructions show that “Vault Credit Score” does not prescribe a single scoring algebra. Depending on the application, the score may be a logit-transformed PD, a quantile-mapped integer score, a divergence-maximizing additive scorecard, a composite function of vault risk channels, or a multiclass risk-tier classifier. The practical implication is that comparability exists only after the target, calibration reference, and policy interface are made explicit (Arram et al., 2023, Wolf et al., 2022, Ghosh et al., 2024, Hoadley, 2020, Khan et al., 16 Oct 2025).

6. Interpretability, privacy, and governance

Interpretability is treated as a first-order design requirement in several of the cited works. In the consumer-credit default study, tree or gradient-boosting importance is proposed for temporal aggregates and engineered variables, and post-hoc explanation for the MLP is recommended through SHAP, including feature-level and month-level attributions aggregated into top-driver lists and reason codes. The same paper recommends fairness checks such as equal opportunity, demographic parity where applicable, and monitoring disparities in precision, recall, AUC, and calibration error, along with stability metrics such as PSI and rolling-window backtesting (Arram et al., 2023).

IQNN-CS develops a much denser interpretability stack for high-stakes credit risk classification. It combines gradient-based saliency, gradient $1=\mathrm{default}$ 12input, integrated gradients, SmoothGrad, example-based prototype matching in quantum feature space using cosine similarity, occlusion analysis, softmax entropy, t-SNE of quantum activations, and a new metric called Inter-Class Attribution Alignment:

$1=\mathrm{default}$ 13

Low off-diagonal ICAA values indicate distinct class-specific reasoning, while high values suggest overlapping attribution logic. In the reported experiments, Dataset 1 exhibits low inter-class attribution similarity and clean separation, whereas Dataset 2 shows substantial overlap associated with misclassification and unstable training (Khan et al., 16 Oct 2025).

Privacy-preserving scoring introduces a different governance problem: securing data-in-use. The functional-encryption approach frames Vault Credit Score as a service that computes creditworthiness over encrypted borrower data while revealing only the score or decision required by the lender. The canonical syntax is

$1=\mathrm{default}$ 14

with correctness $1=\mathrm{default}$ 15. The specific scoring model is a degree-2 polynomial neural network shaped to fit quadratic functional encryption:

$1=\mathrm{default}$ 16

followed by a softmax over the two outputs (Andolfo et al., 2021).

The performance evaluation of the functional-encryption system is implementation-oriented. On a Windows 10 machine with AMD Ryzen 3600X, 6 cores at 3.80 GHz, and 16 GB RAM, encryption takes 14.3 ms for 5 attributes and 52.3 ms for 50 attributes. Reported scoring time reaches 17.3 s for 20 attributes and 200 borrowers, and about 170 s for 50 attributes and 1000 borrowers; the workload is described as highly parallelizable across borrowers and labels (Andolfo et al., 2021).

The privacy claim is deliberately bounded. Functional encryption confines evaluator-side disclosure to the intended function output under adaptive IND-CPA security based on bilinear pairings, but output leakage remains. The paper therefore recommends rate limiting, access control, minimal-output function keys, model-version binding, and key rotation or revocation. This is an objective counterpoint to a common misconception that cryptographic scoring makes governance unnecessary; the cited treatment makes governance more, not less, central because the score itself still reveals a function of sensitive inputs (Andolfo et al., 2021).

Governance is equally explicit in vault-level DeFi work. Minimum transparency standards include machine-readable disclosure of oracle specifications, governance timelocks, emergency controls, parameter histories, liquidation configurations, completion logs with realized execution prices, on-chain depth snapshots for main collateral pairs, gas and MEV data sources, audit artifacts, and dependency graphs. Non-disclosure loosens identified sets and forces conservative worst-case scoring (Zbandut et al., 19 Apr 2026). At the curator layer, proposed on-chain disclosures include asset eligibility and issuer concentration, liquidity coverage ratio under standardized stress assumptions, attestation cadence and signer quality, parameter reactivity, rehypothecation maps, fairness and access metrics where scores or whitelists are used, execution-layer footprint, and oracle sources and update rules (Zbandut et al., 12 Dec 2025).

7. Limitations, controversies, and future directions

A first limitation is semantic. The term “Vault Credit Score” is not standardized across the cited literature. One paper explicitly states that it does not describe a proprietary “Vault” score, while others attach the term to Aave positions, wallets, ERC‑4626 vaults, or curator portfolios. This suggests that any deployment using the label requires a precise statement of the scored entity, event definition, horizon, and calibration regime (Arram et al., 2023, Wolf et al., 2022, Zbandut et al., 19 Apr 2026).

Dataset scope is a recurring restriction. The credit-card study uses a small dataset from a single bank with 500 rows, 477 effective rows, and a 12-month horizon; it reports no external validation, no calibration metrics, and no fairness metrics. The Aave v2 study focuses on a specific protocol, excludes positions under 10 days, does not model realized liquidation transactions, and does not address Aave v3. The OCCR framework depends on identity-linking heuristics and sybil resistance that are explicitly described as limited by false positives, false negatives, privacy trade-offs, and intentional wallet segmentation (Arram et al., 2023, Wolf et al., 2022, Ghosh et al., 2024).

Model-form limitations also differ by line of work. IQNN-CS does not report head-to-head comparisons with classical baselines and shows training instability on the more complex dataset; fairness analyses are not included. Functional encryption constrains the model architecture to degree-2 polynomials, does not report model-utility metrics such as AUC or calibration, and still leaks output information despite protecting plaintext inputs. Liquid Scorecards optimize divergence rather than PD calibration and are sensitive to knot selection; the paper explicitly notes that percentile-based knots underperformed and that smoothness penalties were left for future work (Khan et al., 16 Oct 2025, Andolfo et al., 2021, Hoadley, 2020).

The DeFi vault literature emphasizes structural rather than sample-size limitations. Linear-impact and GBM-style latency approximations may fail in crisis; non-stationarity after upgrades requires immediate re-estimation; extreme tail data are scarce; and cross-protocol dependency can make Level 3 code risk dominate Level 1 credit risk. The curator-layer literature adds that RWA valuations and PDs may depend on off-chain attestations, cross-chain integrity assumptions vary by bridge and chain, oracle design can conceal losses, and black-box whitelisting or score distributions can affect access and systemic liquidity behavior (Zbandut et al., 19 Apr 2026, Zbandut et al., 12 Dec 2025).

Future work in the cited sources is correspondingly heterogeneous. Consumer-credit improvements include external validation, calibration, uncertainty quantification, larger multi-institution datasets, and temporal models such as RNNs, Temporal CNNs, or Transformers. OCCR suggests hazard modeling, scenario simulation, and better calibration. Functional-encryption work points toward public-key quadratic FE, batching, and hybrid FE+ZK or FE+TEE systems. IQNN-CS suggests broader robustness and governance validation. Liquid Scorecards suggest optimized knot selection, transformed characteristics, and quadratic smoothness penalties. For DeFi vaults and curators, the dominant future requirement is standardized, on-chain, machine-readable disclosure sufficient to make risk scores comparable and auditable across protocols and vault managers (Arram et al., 2023, Ghosh et al., 2024, Andolfo et al., 2021, Khan et al., 16 Oct 2025, Hoadley, 2020, Zbandut et al., 19 Apr 2026, Zbandut et al., 12 Dec 2025).