Papers
Topics
Authors
Recent
2000 character limit reached

Zorro Benchmark: Data, Astronomy & Neural Nets

Updated 19 November 2025
  • Zorro Benchmark is a model-agnostic framework defining the Value of Data using matrix estimation techniques in digital advertising.
  • It sets standards for high-resolution imaging with the Zorro speckle interferometer, ensuring diffraction-limited performance and precise astrometry.
  • Zorro activation functions introduce C1-smooth, tunable nonlinearities that improve convergence and accuracy in deep neural network training.

The term "Zorro Benchmark" appears in three distinct and prominent research lines: (1) the measurement and pricing of the Value of Data (VoD) in digital advertising markets, (2) the performance characterization of the Zorro speckle interferometer for high-resolution optical astronomy, and (3) benchmarking the Zorro family of activation functions for deep neural networks. This article addresses each in turn, providing precise definitions, technical details, and benchmark results, with explicit citation to primary literature.

1. Zorro Benchmark in Data Valuation and Consumer Data Markets

Zorro, as introduced by Agarwal et al. 2019, is a model-agnostic system designed for pricing consumer data in online advertising, resolving inefficiencies where users cannot control or monetize personal data, and advertisers transact data without granular valuation. The pivotal benchmark introduced here is an operational, model-free framework for quantifying the Value of Data (VoD) at the user-advertiser-query level (Agarwal et al., 2019).

Formal Definition of Value of Data (VoD)

Let M[0,1]m×nM \in [0,1]^{m \times n} denote the unknown, true click-through rate (CTR) matrix for mm users and nn advertisers, with Mij=CTRij=f(θi,ωj)M_{ij} = CTR_{ij} = f(\theta_i, \omega_j), where θiRd1\theta_i \in \mathbb{R}^{d_1} and ωjRd2\omega_j \in \mathbb{R}^{d_2} are unobserved latent factors. The absolute definition of VoD for user ii and advertiser jj:

VoDij=Mij1mk=1mMkjVoD_{ij} = |M_{ij} - \frac{1}{m} \sum_{k=1}^m M_{kj}|

This is the absolute deviation of an individual's CTR from the population mean for that advertiser.

Model-Agnostic VoD Estimation

The empirical data is sparse and noisy: X{0,1,?}m×nX \in \{0,1,?\}^{m \times n}, with XijX_{ij} observed only if user ii saw ads from advertiser jj. The goal is to estimate MM, without access to advertiser ranking functions, using non-intrusive matrix estimation:

  • Nuclear-norm minimization (singular value thresholding) or low-rank factorization with alternating least squares (ALS), solving:

minM^PΩ(XM^)F2+λM^\min_{\widehat{M}} \|\mathcal{P}_\Omega(X - \widehat{M})\|_F^2 + \lambda \|\widehat{M}\|_*

or

M^=UVT, minU,VPΩ(XUVT)F2+λ(U2+V2)\widehat{M} = U V^T,\ \min_{U,V} \|\mathcal{P}_\Omega(X - U V^T)\|_F^2 + \lambda(\|U\|^2 + \|V\|^2)

where PΩ\mathcal{P}_\Omega zeroes out unobserved entries.

The estimated value of data: VoD^ij=M^ij1mk=1mM^kj\widehat{VoD}_{ij} = |\widehat{M}_{ij} - \frac{1}{m}\sum_{k=1}^m \widehat{M}_{kj}|

Empirical Benchmark and Results

  • Dataset: 190 million ad impressions, 1.15 million clicks (Avito), users binned geographically, 31 ad categories.
  • ALS (rank 2) achieves out-of-sample R20.58R^2 \approx 0.58, explaining ~58% of CTR variance—state-of-the-art in large-scale recommendation.
  • Empirical normalized VoD (category average) νj\nu_j ranges 30%–69% across ad categories, with a lower bound of $16$ billion USD/year in unrealized value if user data is withheld (based on $54$ billion global display-ad spend).

Extension to Explicit User Intent

When explicit intent signals are provided, CTR estimation generalizes to tensor completion (third-order tensor TijlT_{ijl}), with joint estimation over user, advertiser, and intent dimensions yielding an out-of-sample R20.53R^2 \approx 0.53 (versus $0.17$ if intent slices are estimated independently).

This framework supports real-time, per-query valuation of user data, robust to proprietary downstream machine learning pipelines and applicable to privacy-respecting, value-based data market architectures (Agarwal et al., 2019).

2. Zorro Benchmark in High-Resolution Speckle Interferometry

Zorro, as a dual-channel optical speckle camera installed on the Gemini South 8.1 m telescope, serves as a benchmark for diffraction-limited imaging, particularly for binary star and exoplanetary companion characterization (Howell et al., 13 Mar 2025, Mendez et al., 11 Mar 2025).

Instrumental Architecture and Calibration

Zorro features two Andor iXon Ultra 888 EMCCDs (1024×1024, effectively zero read noise), splitting the f/16 beam into blue (350–650 nm) and red (650–1000 nm) arms via a dichroic. Filter wheels, field-switching for speckle (6.7″×6.7″) and wide-field (60″×60″), and high-precision plate scale calibration (residual distortion < 2 mas, plate scale stability < 0.1%) are standard.

Routine calibration includes:

  • Bias, dark, and flat-frame correction (residual gain variation < 1%)
  • Bracketing science targets with PSF standards for speckle transfer function (STF) measurement
  • Astrometric scale and orientation ties via well-calibrated binaries (systematic errors <0.5%<0.5\% in scale, <0.25<0.25^\circ zero-point)

Angular Resolution and Sensitivity Metrics

  • Diffraction-limited resolution at λ\lambda:

θtheory=1.22λ/D\theta_{theory} = 1.22\,\lambda/D

For D=8.1D = 8.1 m:

  • $11$ mas at $350$ nm
  • $18$ mas at $600$ nm (routine inner working angle, IWA =0.02= 0.02'')
  • $31$ mas at $1000$ nm

Realized IWA matches λ/D\lambda/D to within a few percent. Zorro achieves up to 4-fold better resolution than IR AO systems on comparable apertures.

  • 5σ\sigma detection: V=12V=12 in 5 min ($562$ nm); R=19R=19 in 50 min ($832$ nm)
  • Peak dynamic range: Δm8\Delta m \sim 8 mag (103\sim 10^3 in flux) at $1''$, Δm2\Delta m \sim 2 mag at resolution limit

Astrometric and Photometric Precision

Comprehensive astrometric characterization (Mendez et al., 11 Mar 2025):

  • <1 mas precision for ρ<0.4\rho<0.4'', σθ<0.2\sigma_\theta < 0.2^\circ, Δm<0.1\Delta m < 0.1 mag at separations [15,400][15, 400] mas.
  • Inter-channel alignment (red/blue): 0.4\sim 0.4 mas repeatability, <0.2<0.2^\circ in θ\theta.

Smallest reliably detected companions: $15$ mas (red channel); highest measured contrast: Δm832=6\Delta m_{832} = 6 mag.

Data Reduction and Survey Results

Zorro's pipeline includes bias subtraction, power spectrum accumulation, ACF modeling, and nonlinear least squares binary parameter extraction. Over 2019–2023, the survey resolved $70$ binaries (including $11$ new discoveries) down to $15$ mas and Δm=6\Delta m=6 mag (Mendez et al., 11 Mar 2025).

Summary Table: Zorro Speckle Camera Benchmarks

Parameter Value Range
Diffraction limit $11$–$31$ mas (350–1000 nm)
IWA (practical) $0.02''$ at $600$ nm
Contrast (5σ5\sigma) Δm=2\Delta m = 2 mag @ $0.02''$
Δm=8\Delta m = 8 mag @ $1''$
Astrometry <1<1 mas (ρ<0.4\rho <0.4''), 0.20.2^\circ
Limiting mag V=12V=12 (5 min), R=19R=19 (50 min)

Limitations

  • No atmospheric dispersion corrector (AD) at Gemini-South; blue channel degraded at zenith distances >20>20^\circ.
  • Formal resolution accuracy degrades below $20$ mas; extreme-contrast detection limits depend on AD and SNR.

3. Zorro Benchmark in Neural Network Activation Functions

The Zorro activation family, as characterized by Erichson et al., constitutes a parametric, continuously differentiable set of activation functions encompassing and extending ReLU, GELU, Swish, and related nonlinearities (Roodschild et al., 28 Sep 2024). The benchmark defines both theoretical expressivity and empirical performance in deep networks.

Mathematical Structure

Each Zorro variant is specified by composite application of piecewise-linear and generalized sigmoid mappings, with tunable parameters for center, slope, and tail "hump" shape. The family includes symmetric (Zorro_sym), asymmetric (Zorro_asym), sigmoid-like, tanh-like, and sloped variants, with the core property that:

  • The function is C1C^1 (continuous first derivative) everywhere.
  • Central region is linear (matching ReLU on [0,1][0,1]), with sigmoidal tails to prevent unbounded outputs.
  • Parameters (a,b,m)(a, b, m) allow tuning between classical activations and new configurations.

Benchmark Protocols and Metrics

Benchmarks are conducted on:

  • Fully connected networks (MNIST): measuring depth stability and maximal depth at which 40%\geq40\% of parameter sets achieve 90%\geq90\% validation accuracy.
  • Convolutional networks (CIFAR-10, Fashion MNIST, EMNIST, MNIST): mean and std dev of validation accuracy over 10 runs; comparisons versus ReLU and GELU with pp-values.
  • Vision transformers (CIFAR-100): Top-5 accuracy, statistical comparison to original activations.

Example Table: CNN Validation Accuracy (mean±\pmstd, %, 30 epochs)

Activation CIFAR-10 Fashion MNIST MNIST
ReLU 59.1±\pm1.3 90.0±\pm0.3 98.8±\pm0.2
Zorro_sloped 61.5±\pm1.3 90.9±\pm0.2 98.9±\pm0.1

Best central-linear Zorro variants match or exceed ReLU/GELU in all configurations (statistically significant at p<0.05p<0.05 on multiple datasets); parameter sweeps maintain stability at depths up to $40$.

Parameter Sensitivity and Practical Recommendations

Default parameters (e.g. as=0.4,ai=5,b=0.4a_s=0.4, a_i=5, b=0.4, sloped m=1.2m=1.2) yield high depth stability and robust convergence across architectures, with minimal computational overhead relative to classical activations. Batch normalization is not required for stable performance (empirically confirmed).

The Zorro activation benchmark establishes C1C^1-smooth, tunable activations as direct, drop-in replacements for classic nonlinearities, with competitive or improved accuracy and convergence, and broad parametric expressivity (Roodschild et al., 28 Sep 2024).

4. Cross-Domain Relevance and Theoretical Significance

The Zorro Benchmark nomenclature, across three research domains, consistently denotes performance-critical, model-agnostic, and highly calibrated frameworks for benchmarking value, precision, or trainability:

  • In data economics, the benchmark is query-level, model-independent valuation.
  • In astronomy, it is the precision and dynamic range at the extreme limits of resolving power and faint object detection, with rigorous instrumental calibration.
  • In machine learning, it quantifies both theoretical smoothness and empirical efficiency across deep architectures, tightly linked to gradient flow and vanishing/exploding gradient regimes.

This cross-domain convergence underscores a shared emphasis on robust, interpretable, and reproducible benchmarks as the basis for model, instrument, or algorithm selection and valuation.

5. Key Implications, Limitations, and Future Prospects

  • Data Valuation (VoD): The Zorro benchmark enables data markets where compensation and access are proportional to demonstrable impact, with privacy-aware valuation at scale; limitations include dependence on accurate matrix completion and the absence of context-specific modeling (e.g., strategic behavior or multi-agent interactions).
  • Speckle Interferometry: Zorro benchmarks set the standard for current and future optical imaging systems, with residual limitations determined by atmospheric dispersion correction and the fidelity of system calibration at the diffraction limit.
  • Neural Network Nonlinearities: The benchmark defines a flexible, unified parameter space for activation functions, allowing principled comparison and adaptive selection; specific performance may still be architecture- and dataset-dependent.

A plausible implication is that the Zorro benchmark methodology—in which absolute, interpretable, and model-free value or performance is operationalized at the finest granularity—can generalize across domains where proprietary, noisy, or black-box models prohibit direct benchmarking against ground truth. This suggests future directions in highly modular benchmarking protocols for other domains, particularly those balancing privacy, proprietary algorithms, and open measurement.


References:

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Zorro Benchmark.