VIDB: Value Intensity DataBase Overview
- VIDB is a repository of open-ended texts annotated with continuous value-intensity scores mapped to hierarchical human value theories.
- It employs ranking-based aggregation and normalization techniques to robustly estimate latent utilities and ensure direction balance.
- VIDB underpins steerable alignment frameworks, enabling extraction, evaluation, and control of LLM outputs along pluralistic value dimensions.
The Value Intensity DataBase (VIDB) is a large-scale, theory-grounded repository of open-ended texts annotated with calibrated, continuous value-intensity scores. It forms a central component of the VALUEFLOW alignment framework, providing infrastructure for the extraction, evaluation, and steerable control of LLMs along human value axes. VIDB was specifically constructed to address limitations in existing value datasets, including lack of hierarchical coverage, insufficient intensity calibration, and inadequate support for pluralistic value alignment scenarios (Kim et al., 3 Feb 2026).
1. Dataset Construction and Scope
VIDB is constructed from value-annotated corpora originally leveraged in the development of the HiVES value embedding space. Data sources include Denevil, MFRC, and Social Chemistry (for Moral Foundations Theory/MFT); ValueEval and ValueNet (for Schwartz Value Theory/SVT); and ValuePrism (for Duties & Rights). All texts are mapped to a theory-specific value hierarchy and assigned a direction label: supports (+1), opposes (–1), or neutral (0). The dataset consists primarily of English-language texts, with pipelines for Chinese, Korean, and Arabic documented as proof-of-concept.
The preprocessing and filtering pipeline applies exact-string deduplication, value-targeted sampling (up to 10,000 unique texts per value for 32 mid-level values), and direction balance enforcement. For each text, k–1 opponent samples are collected within the same value pool, forming ranking windows. Each text is involved in multiple such ranks (m·k total), ensuring robust latent utility estimation. Quality is maintained by automatic LLM plausibility checks: a 7-model LLM panel flags items for human adjudication if at least two deem a rating implausible or extreme.
Table: Key Dataset Characteristics
| Corpus Origin | Value Theory | Example Value (mid-level) | Language | Max Texts per Value |
|---|---|---|---|---|
| Denevil, MFRC, Social Chemistry | MFT | Fairness | English, (Other trials) | 10,000 |
| ValueEval (PVQ), ValueNet | SVT | Benevolence | English, (Other trials) | 10,000 |
| ValuePrism | Duties & Rights | Justice | English, (Other trials) | 10,000 |
The total size of VIDB is approximately 320,000 texts (32 values × 10,000), with strict equal coverage of values and 50% direction label balance.
2. Labeling Schema and Taxonomic Structure
VIDB covers four canonical value theories: Schwartz Theory of Basic Human Values (12 mid-level values), Moral Foundations Theory (6 foundations), Ross’s Prima Facie Duties (7 duties), and Three Generations of Human Rights (7 domains). Each theory is encoded as a tree (three levels for SVT, two for MFT, one for Duties, and three for Rights), capturing both hierarchy and compositionality.
Annotation follows a defined workflow: at each node, a 7-model LLM panel votes on the best category; cases lacking a strong majority default to a “Neutral” category or escalate to human adjudication. Leaf-level direction is determined by analogous voting (“supports,” “opposes,” “not related”), mapped to +1/–1/0 labels. Prompts and annotation guidelines are documented in Appendix sections of the source (Kim et al., 3 Feb 2026).
3. Intensity Estimation and Normalization Methodology
VIDB employs a ranking-based aggregation approach for estimating value intensity. Each ranking window presents binary comparisons (k = 2), where an LLM judge, given two texts and the value definition, selects the one that more strongly supports the value. This process is repeated m times per text across sampled windows.
Latent utilities for each text are computed via maximum likelihood estimation under the Plackett–Luce (PL) model:
Utility vectors are gradient-optimized per ranking, then normalized to an interpretable continuous scale . The normalization scheme defaults to z-score with max-abs clipping,
but supports min–max and quantile-Gaussianization alternatives. Z-score normalization yielded the most stable value-intensity estimates across sampling runs.
Post-aggregation, a 7-model LLM panel flags problematic ratings (≥2 flags trigger human review). For flagged cases, the final intensity is computed by blending the PL-based score and the human mean with .
4. Data Structure and Accessibility
Each VIDB entry is stored as a JSON-lines or Parquet record, encompassing:
text: the open-ended text stringtheory: value theory label ("SVT", "MFT", "Duty", "Right")value_path: array indicating hierarchy node pathdirection: integer in {+1, 0, –1}raw_utility: pre-normalization utility score ()intensity: normalized value support inflagged: Boolean indicating human adjustmenthuman_adjustment: optional float for blended human-model score
Example:
1 2 3 4 5 6 7 8 9 |
{
"text": "Rescuing people from concentration camps.",
"theory": "SVT",
"value_path": ["Self-transcendence", "Benevolence", "Caring"],
"direction": +1,
"raw_utility": 1.87e0,
"intensity": 8.5,
"flagged": false
} |
/api/vidb/SVT/Benevolence?page=1). Licensing is Apache-2.0 for code/models; VIDB carries a non-commercial, research-only license due to original data source constraints.
5. Statistical Properties and Evaluation Metrics
Each value pool in VIDB contains approximately 10,000 texts. Intensity scores display a roughly continuous distribution over , with a slight central bulge (neutral) and sparser coverage of extreme values, by design. Across all 320,000 entries:
- Mean intensity: ≈ 0.02
- Standard deviation: ≈ 5.8
- 10th/50th/90th percentiles: –8.7 / 0.0 / +8.9
Human-model ranking instability is quantified with statistics such as mean variance (2.1), maximum range (2.8), and sign-flip rate (29%). In stability assessment, human scalar ratings varied by 1.4 points, compared to 2.1–4.2 for rating-based LLM judges. Pairwise human consistency reached 85.3%, with windowed (6-bin) exact match at 60.8% and ±1-bin match at 86.7% (mean deviation 0.46 windows).
6. Evaluation, Steering, and Integration Practices
VIDB underpins the anchor-based evaluator used in VALUEFLOW. Evaluation proceeds by batching new responses with anchor texts sampled from the VIDB pool for value , ranking ensembles, and estimating only the utility (anchors remain fixed). The per-value calibration maps raw scores to , with local clamping as needed.
Steerability experiments rely on the difference to quantify how much LLM outputs can be manipulated along a value axis. Anchor prompts and user-text prompts both reference calibrated panels drawn from VIDB to define and control value intensity targets. VIDB also enables demographic profiling by aggregating intensities over candidate texts, weighted by similarity in the HiVES embedding space.
The binary ranking and PL inference protocol supports robust workflows: prompt (Box 4) → PL solve → normalization (VIDB entry), and evaluation (Alg. 4) → ranking vs. anchors → one-dimensional PL inference → calibrated score.
7. Limitations and Prospective Developments
Current coverage includes 32 mid-level values; extending to all 19 refined SVT values or higher-level anchors would improve granularity. Initial pipelines for additional languages and domain-specific systems (e.g., Buddhist ethics) are demonstrated, but require further human adjudication for production use. Long-horizon consistency poses challenges, as multi-turn dialogue can dilute injected value intensities. Future directions include richer user modeling via preference traces and extending steerable value control to tasks such as planning, summarization, and code generation (Kim et al., 3 Feb 2026).
Together, VIDB provides foundational infrastructure for pluralistic and steerable value-based alignment research, supporting stable model evaluation and fine-grained control of value expression in LLMs.