VIDB: Value Intensity DataBase Overview

Updated 10 February 2026

VIDB is a repository of open-ended texts annotated with continuous value-intensity scores mapped to hierarchical human value theories.
It employs ranking-based aggregation and normalization techniques to robustly estimate latent utilities and ensure direction balance.
VIDB underpins steerable alignment frameworks, enabling extraction, evaluation, and control of LLM outputs along pluralistic value dimensions.

The Value Intensity DataBase (VIDB) is a large-scale, theory-grounded repository of open-ended texts annotated with calibrated, continuous value-intensity scores. It forms a central component of the VALUEFLOW alignment framework, providing infrastructure for the extraction, evaluation, and steerable control of LLMs along human value axes. VIDB was specifically constructed to address limitations in existing value datasets, including lack of hierarchical coverage, insufficient intensity calibration, and inadequate support for pluralistic value alignment scenarios (Kim et al., 3 Feb 2026).

1. Dataset Construction and Scope

VIDB is constructed from value-annotated corpora originally leveraged in the development of the HiVES value embedding space. Data sources include Denevil, MFRC, and Social Chemistry (for Moral Foundations Theory/MFT); ValueEval and ValueNet (for Schwartz Value Theory/SVT); and ValuePrism (for Duties & Rights). All texts are mapped to a theory-specific value hierarchy and assigned a direction label: supports (+1), opposes (–1), or neutral (0). The dataset consists primarily of English-language texts, with pipelines for Chinese, Korean, and Arabic documented as proof-of-concept.

The preprocessing and filtering pipeline applies exact-string deduplication, value-targeted sampling (up to 10,000 unique texts per value for 32 mid-level values), and direction balance enforcement. For each text, k–1 opponent samples are collected within the same value pool, forming ranking windows. Each text is involved in multiple such ranks (m·k total), ensuring robust latent utility estimation. Quality is maintained by automatic LLM plausibility checks: a 7-model LLM panel flags items for human adjudication if at least two deem a rating implausible or extreme.

Table: Key Dataset Characteristics

Corpus Origin	Value Theory	Example Value (mid-level)	Language	Max Texts per Value
Denevil, MFRC, Social Chemistry	MFT	Fairness	English, (Other trials)	10,000
ValueEval (PVQ), ValueNet	SVT	Benevolence	English, (Other trials)	10,000
ValuePrism	Duties & Rights	Justice	English, (Other trials)	10,000

The total size of VIDB is approximately 320,000 texts (32 values × 10,000), with strict equal coverage of values and 50% direction label balance.

2. Labeling Schema and Taxonomic Structure

VIDB covers four canonical value theories: Schwartz Theory of Basic Human Values (12 mid-level values), Moral Foundations Theory (6 foundations), Ross’s Prima Facie Duties (7 duties), and Three Generations of Human Rights (7 domains). Each theory is encoded as a tree (three levels for SVT, two for MFT, one for Duties, and three for Rights), capturing both hierarchy and compositionality.

Annotation follows a defined workflow: at each node, a 7-model LLM panel votes on the best category; cases lacking a strong majority default to a “Neutral” category or escalate to human adjudication. Leaf-level direction is determined by analogous voting (“supports,” “opposes,” “not related”), mapped to +1/–1/0 labels. Prompts and annotation guidelines are documented in Appendix sections of the source (Kim et al., 3 Feb 2026).

3. Intensity Estimation and Normalization Methodology

VIDB employs a ranking-based aggregation approach for estimating value intensity. Each ranking window presents binary comparisons (k = 2), where an LLM judge, given two texts and the value definition, selects the one that more strongly supports the value. This process is repeated m times per text across sampled windows.

Latent utilities $\theta_i$ for each text are computed via maximum likelihood estimation under the Plackett–Luce (PL) model:

$P(\pi | \theta) = \prod_{j=1}^{k} \frac{\exp(\theta_{\pi_j})}{\sum_{\ell=j}^k \exp(\theta_{\pi_\ell})}$

Utility vectors are gradient-optimized per ranking, then normalized to an interpretable continuous scale $[-10, 10]$ . The normalization scheme defaults to z-score with max-abs clipping,

$z_i = (s_i - \mu)/\sigma,\quad \hat s_i = 10 \cdot \frac{z_i}{\max_j |z_j|},$

but supports min–max and quantile-Gaussianization alternatives. Z-score normalization yielded the most stable value-intensity estimates across sampling runs.

Post-aggregation, a 7-model LLM panel flags problematic ratings (≥2 flags trigger human review). For flagged cases, the final intensity is computed by blending the PL-based score and the human mean with $\lambda=0.5$ .

4. Data Structure and Accessibility

Each VIDB entry is stored as a JSON-lines or Parquet record, encompassing:

text: the open-ended text string
theory: value theory label ("SVT", "MFT", "Duty", "Right")
value_path: array indicating hierarchy node path
direction: integer in {+1, 0, –1}
raw_utility: pre-normalization utility score ( $\theta_i$ )
intensity: normalized value support in $[-10, 10]$
flagged: Boolean indicating human adjustment
human_adjustment: optional float for blended human-model score

Example:

{
  "text": "Rescuing people from concentration camps.",
  "theory": "SVT",
  "value_path": ["Self-transcendence", "Benevolence", "Caring"],
  "direction": +1,
  "raw_utility": 1.87e0,
  "intensity": 8.5,
  "flagged": false
}

The VIDB is released as part of the VALUEFLOW repository, with API endpoints for programmatic access (e.g., /api/vidb/SVT/Benevolence?page=1). Licensing is Apache-2.0 for code/models; VIDB carries a non-commercial, research-only license due to original data source constraints.

5. Statistical Properties and Evaluation Metrics

Each value pool in VIDB contains approximately 10,000 texts. Intensity scores display a roughly continuous distribution over $[-10,10]$ , with a slight central bulge (neutral) and sparser coverage of extreme values, by design. Across all 320,000 entries:

Mean intensity: ≈ 0.02
Standard deviation: ≈ 5.8
10th/50th/90th percentiles: –8.7 / 0.0 / +8.9

Human-model ranking instability is quantified with statistics such as mean variance (2.1), maximum range (2.8), and sign-flip rate (29%). In stability assessment, human scalar ratings varied by 1.4 points, compared to 2.1–4.2 for rating-based LLM judges. Pairwise human consistency reached 85.3%, with windowed (6-bin) exact match at 60.8% and ±1-bin match at 86.7% (mean deviation 0.46 windows).

6. Evaluation, Steering, and Integration Practices

VIDB underpins the anchor-based evaluator used in VALUEFLOW. Evaluation proceeds by batching new responses $x$ with $k-1$ anchor texts sampled from the VIDB pool for value $v$ , ranking ensembles, and estimating only the utility $u(x)$ (anchors remain fixed). The per-value calibration $g_v$ maps raw scores to $I_v(x) \in [-10,10]$ , with local clamping as needed.

Steerability experiments rely on the difference $\Delta = I_{\mathrm{steered}} - I_{\mathrm{default}}$ to quantify how much LLM outputs can be manipulated along a value axis. Anchor prompts and user-text prompts both reference calibrated panels drawn from VIDB to define and control value intensity targets. VIDB also enables demographic profiling by aggregating intensities over candidate texts, weighted by similarity in the HiVES embedding space.

The binary ranking and PL inference protocol supports robust workflows: prompt (Box 4) → PL solve → normalization (VIDB entry), and evaluation (Alg. 4) → ranking vs. anchors → one-dimensional PL inference → calibrated score.

7. Limitations and Prospective Developments

Current coverage includes 32 mid-level values; extending to all 19 refined SVT values or higher-level anchors would improve granularity. Initial pipelines for additional languages and domain-specific systems (e.g., Buddhist ethics) are demonstrated, but require further human adjudication for production use. Long-horizon consistency poses challenges, as multi-turn dialogue can dilute injected value intensities. Future directions include richer user modeling via preference traces and extending steerable value control to tasks such as planning, summarization, and code generation (Kim et al., 3 Feb 2026).

Together, VIDB provides foundational infrastructure for pluralistic and steerable value-based alignment research, supporting stable model evaluation and fine-grained control of value expression in LLMs.

Markdown Report Issue Upgrade to Chat

References (1)

VALUEFLOW: Toward Pluralistic and Steerable Value-based Alignment in Large Language Models (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Value Intensity DataBase (VIDB).