Young Adult Recidivism Prediction

Updated 5 February 2026

Young adult recidivism prediction involves ML systems targeting individuals aged 18–25 using detailed LA case data to forecast reoffending.
The pipeline utilizes probabilistic record linkage, diverse model architectures, and evaluation metrics like precision@150 and AUC of 0.78 to ensure robust predictions.
The approach integrates fairness auditing and threshold adjustments to balance predictive accuracy with equitable resource allocation across demographic subgroups.

Young adult recidivism prediction leverages machine learning systems to identify individuals, specifically ages 18–25, at highest risk of subsequent interactions with the criminal justice system following misdemeanor charges. The objective is to enable resource-targeted, individually-tailored social service interventions while rigorously balancing predictive efficiency with fairness across demographic subgroups. The Los Angeles City Attorney’s Recidivism Reduction and Drug Diversion (R2D2) program provides a comprehensive operational framework, dataset construction methodology, and auditing protocol for such predictive systems, with an explicit focus on subgroup equity (Rodolfa et al., 2020).

1. Cohort Definition and Data Pipeline

The foundational dataset comprises records from the Los Angeles City Attorney’s case-management system for misdemeanor charges, spanning 1995–2017. Individual-level identifiers are probabilistically assigned using the “pgdedupe” system, which links case-level records by matching on name, date of birth, address, and California “CII” number, yielding 1.53 million unique persons across 2.46 million cases. For the young adult cohort, individuals are filtered to those aged 18–25 at prediction date with at least one prior misdemeanor booking or case in the preceding five years. Prediction dates are semiannual, e.g., Jan 1, 2012; Jul 1, 2012; …; Jan 1, 2017.

Outcome labels are binary: $Y=1$ if the individual experiences any new booking or City Attorney case within the next 180 days, $Y=0$ otherwise. Baseline six-month recidivism rate in the original (all-ages) cohort is 4.4%. Key input features comprise demographics (age at first arrest, current age, gender, race/ethnicity), prior offense history (total and charge-type-specific counts, arrest recency), and, where available, social factors (homelessness, mental-health diversion, substance-abuse treatment). Continuous features remain untransformed until normalization; counts enter as numeric; categorical fields are one-hot–encoded.

2. Model Training and Selection

Data preprocessing consists of (1) median imputation for missing continuous variables, assignment of “missing” indicators for unobserved categorical values (e.g., unknown race), (2) one-hot encoding of categorical predictors, and (3) standardization (zero mean, unit variance) of continuous features. An algorithm grid spanning four architectures is trained using a uniform pipeline:

Logistic regression (L1/L2 penalty, $C\in\{0.01, 0.1, 1, 10\}$ )
Decision tree (max depth $\in\{5,10,20,50\}$ , min-samples-split $\in\{10,100\}$ )
Random forest (n_estimators $\in\{100, 500, 1000\}$ , max_depth $\in\{10,50\}$ , min_samples_split $\in\{10,100\}$ , criterion $\in\{\text{gini},\text{entropy}\}$ )
Gradient-boosted trees (learning rate $\in\{0.01, 0.1\}$ , n_estimators $\in\{100, 500\}$ , max_depth $\in\{3, 5\}$ )

Intertemporal cross-validation uses only historical data for training, with the subsequent six months as the evaluation window. The primary operational metric—motivated by resource constraints—is precision@150: the fraction of the 150 highest-scoring candidates who recidivate in the six months post-prediction.

The empirically selected final model is a random forest with 1000 estimators, maximum depth 50, minimum samples per split 100, Gini criterion, and $\text{max\_features}=\text{sqrt}$ , optimized for binary classification log-loss with early stopping. Model selection prioritizes both accuracy (precision@150) and temporal stability of recall.

3. Performance Metrics and Empirical Results

Performance assessment is anchored in classical classification metrics:

Precision: $\mathrm{Precision} = \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}}$
Recall: $\mathrm{Recall} = \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}$
False Positive Rate: $\mathrm{FPR} = \frac{\mathrm{FP}}{\mathrm{FP}+\mathrm{TN}}$
False Negative Rate: $\mathrm{FNR} = 1-\mathrm{Recall}$
AUC: $P(\hat S_i > \hat S_j\,|\,Y_i=1,\,Y_j=0)$ , with $\hat S_i$ the risk score.

At Jan 1, 2017, using the selected random forest model (on $N=415{,}614$ ), baseline six-month recidivism was 4.4%, precision@150 was 0.73 (109/150), and held-out July–December 2017 precision@150 was 0.69 (104/150). Cross-validated AUC was 0.78, typical for tree-based architectures in this domain. The most influential features included recency of last arrest, number of priors in the last six months, and both age at first arrest and current age.

4. Measuring and Addressing Predictive Fairness

Given the R2D2 program’s fixed intervention capacity (150 plans per six months), the central equity concern is subgroup under-service. Equity audit focuses on recall parity (“equality of opportunity” [Hardt et al. 2016]) across protected attributes $A\in\{\text{race/ethnicity},\text{gender}\}$ . For a group $A=a$ , recall is

$\mathrm{Recall}(A=a) = \frac{\mathrm{TP}(A=a)}{\mathrm{TP}(A=a)+\mathrm{FN}(A=a)}$

with inter-group disparity $\Delta_{\mathrm{Rec}} = \big|\mathrm{Recall}(A=a) - \mathrm{Recall}(A=b)\big|$ .

Additional non-threshold-invariant metrics—demographic parity gap, equalized odds—may be informative but are not operationally prioritized in this restricted-capacity setting. Empirical recall by race for baseline top 150 is: White 0.66%, Black 0.74%, Hispanic 0.47%. Hispanic defendants are under-served by over 40% relative to White.

5. Efficiency–Equity Trade-offs and Threshold Adjustment

To mitigate unfairness, two principal levers are available: (a) inclusion of a fairness penalty in the training objective (e.g., log-loss $+$ $\lambda\,\Delta_{\mathrm{Rec}}$ ), or (b) post-processing via group-specific risk-score thresholds. The latter involves computing “rolling” recall $R_g(k)$ for the top $k$ in each group $g$ ; group-specific cutoffs $k_g$ are chosen to (i) equalize minimum recall (equal recall regime), or (ii) allocate slots proportional to baseline prevalence, under the constraint $\sum_g k_g=150$ .

Scenario	White	Black	Hispanic	Other	Unknown	Total	Precision@total
Top 150 (baseline)	52 (0.66%)	52 (0.74%)	52 (0.47%)	27 (0.81%)	17 (0.53%)	150	0.727
Equalized recall (150 slots)	39 (0.81%)	39 (0.81%)	39 (0.81%)	18 (0.81%)	15 (0.81%)	150	0.707
Proportional (150 slots)	49 (0.66%)	49 (1.04%)	49 (0.80%)	25 (1.13%)	18 (0.57%)	150	0.709

Equalizing recall across groups slightly reduces overall precision (70.7% vs. 72.7%) but eliminates under-coverage of Hispanic defendants (raising recall from 0.47% to 0.81%). The “proportional” regime allocates more intervention slots to higher-base-rate groups.

6. Intervention Prioritization and Resource Allocation

Following threshold adjustment, each individual receives a calibrated risk-score and, as dictated by group-specific selection, R2D2 pre-builds intervention plans (e.g., diversion, conditional plea, service referral). Only ~150 advance plans are feasible biannually, strictly enforced by group-level quotas. On identification or appearance in court, these plans enable social-services staff to immediately enact customized diversions.

A plausible implication is that such quota-based assignment avoids reinforcing historical disparities by guaranteeing equitable representation across protected classes. Ongoing evaluation of both overall precision@k and subgroup recall allows dynamic readjustment as demographic or system patterns shift.

7. Extending to Young Adult Populations

The entire R2D2 pipeline generalizes to young adults by (a) filtering the cohort to ages 18–25 and (b) incorporating youth-specific predictors, such as school enrollment, juvenile-justice history, and employment status, through one-hot encoding. Recidivism dynamics in this age band show distinct patterns, peaking near ages 20–22; temporal aspects of features such as “count of juvenile charges” should be aligned accordingly.

Fairness assessment is expanded to include parity not only by race/ethnicity but also by intragroup age bands (e.g., 18–20 vs. 21–25). Threshold-adjustment algorithms extend directly to multidimensional protected groups by allocating slots ( $k_g$ ) for each intersectional slice. The R2D2 unit’s operational deployment includes re-scoring all eligible defendants bi-weekly, highlighting the equity-adjusted top $k$ , and routing these candidates to social-services staff. Systematic tracking of both precision and recall by subgroup ensures sustained efficiency and fairness as the system matures (Rodolfa et al., 2020).

Markdown Report Issue Upgrade to Chat

References (1)

Case Study: Predictive Fairness to Reduce Misdemeanor Recidivism Through Social Service Interventions (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Young Adult Recidivism Prediction.