Image-Selection Relation: Lensing & Vision

Updated 11 January 2026

Image-Selection Relation (ISR) is a formalism that isolates image-defining properties by separating geometric focusing from residual lensing deflections.
In gravitational lensing, ISR underpins mass-sheet transformations that preserve image positions while rescaling magnification and time delays.
In vision–language evaluation, ISR supports the BISON protocol by providing interpretable, fine-grained metrics for text-to-image matching.

The Image-Selection Relation (ISR) is a formalism used in two distinct contexts: optical gravitational lensing theory and the evaluation of vision–LLMs. In gravitational lensing, ISR provides the mathematical structure underlying image formation, magnification, and time-delay invariance under mass-sheet transformations. In computer vision, ISR underpins the Binary Image Selection (BISON) evaluation protocol for text-to-image matching models, delivering interpretable metrics for fine-grained image–text correspondence. Both applications employ an ISR framework to isolate the image-defining properties of complex mappings from either physical lensing or embedding-based model scoring.

1. Mathematical Structure in Gravitational Lensing

In gravitational lensing, the total ray deflection at position $b$ in the deflector plane is given by

$\Lambda(b) = \frac{b_s - b}{\mathfrak{D}}$

where $b_s$ is the (scaled) unlensed source position, and $\mathfrak{D} = D_d D_{ds} / D_s$ is the geometric distance factor. This separates into two terms: a "geometric focusing" term

$F_g(b) = \alpha_E(b) = -\frac{b}{\mathfrak{D}}$

and a remainder,

$\alpha_I(b) = \Lambda(b) - F_g(b),$

which defines the Image-Selection Relation:

$\alpha_I(b) = \frac{b_s}{\mathfrak{D}}.$

The ISR specifies that, after removing geometric focusing, the image-forming lens must deflect all candidate rays by the same constant vector. With $\alpha_I(b) = \nabla \psi'(b)$ , the relation reads $\nabla \psi'(b) = b_s / \mathfrak{D}$ (Gorenstein, 4 Jan 2026).

2. Scaling Symmetry and Connection to the Mass-Sheet Transformation

A key feature of ISR in lensing is its invariance under uniform rescaling:

$\epsilon\,\alpha_I(b) = \epsilon\,\frac{b_s}{\mathfrak{D}}.$

This symmetry leaves image positions unchanged while scaling magnifications as $\mu \mapsto \epsilon^{-2} \mu$ and time delays as $\Delta t \mapsto \epsilon \Delta t$ . Restoring the geometric focusing yields the classic Mass-Sheet Transformation (MST): the original mass profile is rescaled and a uniform sheet is added, preserving image locations but rescaling magnification and delays (Gorenstein, 4 Jan 2026). In convergence notation,

$\tilde{\kappa}(\theta) = \epsilon\,\kappa(\theta) + (1 - \epsilon)$

demonstrates the transformed profile.

3. ISR in Vision–Language Evaluation: The BISON Protocol

In text-to-image matching, ISR formalizes the evaluation setup where a model selects, for each fine-grained text query $t$ , the correct image $I_i^+$ from a pair $(I_i^+, I_i^-)$ of semantically similar candidates. The selection function is:

$\hat{I}_i = \arg\max_{I \in \{I_i^+, I_i^- \}} s(I, t_i)$

where $s: I \times T \rightarrow \mathbb{R}$ is the model's compatibility function. For retrieval systems, $s(I, t) = \langle f_V(I), f_T(t) \rangle$ with image and text embeddings, while for captioning models, $s(I, t) = \log p(t|I) / |t|$ (Hu et al., 2019).

4. Dataset Construction and Properties (COCO-BISON)

The COCO-BISON dataset, derived from the COCO validation split, implements ISR/BISON evaluation in three stages:

Stage	Description	Examples Retained
1	Pairwise similarity via FastText caption embeddings	67,564 candidate pairs
2	Annotator selection of discriminative captions	61,861 triples (91.6%)
3	Verification by independent annotators	54,253 triples (87.7%)

Key statistics include 54,253 query–image triples, coverage of $\approx 95.5\%$ of COCO-val images, and text query distribution matching that of the training corpus (Hu et al., 2019).

5. ISR’s Advantages Over Traditional Retrieval and Captioning Metrics

Traditional image retrieval metrics, such as Recall@k, and captioning scores (BLEU, CIDEr, METEOR, SPICE), suffer from ambiguous negatives and poor correspondence with human correctness. Notably, 56% of retrieval "errors" are genuine matches, and captioning metrics prefer generic captions. ISR/BISON directly addresses these issues:

Binary accuracy is defined as:

$\text{Accuracy} = \frac{1}{N} \sum_{i=1}^N \mathbf{1}[\hat{I}_i = I_i^+]$

The contrasting images are explicitly labeled, and negatives are fine-grained semantic distractors.
The protocol is interpretable, low variance, and focuses evaluation on fine-grained grounding rather than generic correspondence (Hu et al., 2019).

6. Experimental Protocols and Observed Performance

Experimental results on COCO-BISON demonstrate consistent ordering among retrieval and captioning models, with BISON accuracy universally higher than Recall@1:

System	Recall@1	Recall@5	BISON Accuracy
ConvNet+BoW	45.19	79.26	80.48
ConvNet+Bi-GRU	49.34	82.22	81.75
Obj+Bi-GRU	53.97	85.26	83.90
SCAN i2t	52.35	84.44	84.94
SCAN t2i	54.10	85.58	85.89

For captioning systems, BISON accuracy identifies the gap to human-level matching, where models outperform humans on BLEU/CIDEr but not on BISON accuracy (human: 100%) (Hu et al., 2019). This suggests BISON is more reflective of true visual–textual matching.

7. Compact Algorithm and Theoretical Summary

The ISR/BISON evaluation operates algorithmically as follows:

For each triple $(t_i, I_i^+, I_i^-)$ , compute model scores $s^+ = s(I_i^+, t_i)$ , $s^- = s(I_i^-, t_i)$ .
Predict $\hat{I}_i = I_i^+$ if $s^+ \geq s^-$ , else $I_i^-$ .
BISON accuracy is the fraction where $\hat{I}_i = I_i^+$ .

Formally,

$D = \left\{ (t_i, I_i^+, I_i^-) \right\}_{i=1}^N,\quad \hat{I}_i = \arg\max_{I \in \{I_i^+, I_i^-\}} s(I, t_i),\quad \text{Accuracy} = \frac{1}{N} \sum_{i=1}^N \mathbf{1}[\hat{I}_i = I_i^+]$

(Hu et al., 2019).

A plausible implication is that ISR in both physical lensing and vision–language modeling naturally isolates the image-defining mechanics (via deflection or compatibility scoring) and makes manifest the core invariance properties or interpretability of model outputs. The geometric–optical origin of image invariance in mass-sheet transformations directly parallels ISR’s role in evaluating model grounding precision.

References

"Evaluating Text-to-Image Matching using Binary Image Selection (BISON)" (Hu et al., 2019)
"The Optical Origin of the Mass-Sheet Transformation" (Gorenstein, 4 Jan 2026)

Markdown Report Issue Upgrade to Chat

References (2)

The Optical Origin of the Mass-Sheet Transformation (2026)

Evaluating Text-to-Image Matching using Binary Image Selection (BISON) (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Image-Selection Relation (ISR).

Image-Selection Relation: Lensing & Vision

1. Mathematical Structure in Gravitational Lensing

2. Scaling Symmetry and Connection to the Mass-Sheet Transformation

3. ISR in Vision–Language Evaluation: The BISON Protocol

4. Dataset Construction and Properties (COCO-BISON)

5. ISR’s Advantages Over Traditional Retrieval and Captioning Metrics

6. Experimental Protocols and Observed Performance

7. Compact Algorithm and Theoretical Summary

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Image-Selection Relation: Lensing & Vision

1. Mathematical Structure in Gravitational Lensing

2. Scaling Symmetry and Connection to the Mass-Sheet Transformation

3. ISR in Vision–Language Evaluation: The BISON Protocol

4. Dataset Construction and Properties (COCO-BISON)

5. ISR’s Advantages Over Traditional Retrieval and Captioning Metrics

6. Experimental Protocols and Observed Performance

7. Compact Algorithm and Theoretical Summary

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research