Object Texture Intensity (OTI) in Vision

Updated 31 January 2026

OTI is a quantitative framework that measures the texture strength on semantic objects, offering key insights into image and 3D model analysis.
It aggregates high-frequency filter responses and multi-view metrics to assess adversarial attackability and tracking robustness in vision systems.
OTI’s model-free and interpretable metrics enable practical applications in active learning, adversarial training, and robust visual tracking evaluation.

Object Texture Intensity (OTI) is a quantitative framework for measuring the strength and richness of texture present on semantic objects within images or rendered views of 3D models. OTI has emerged as a critical metric for analyzing vulnerability to adversarial perturbations in computer vision as well as for benchmarking and predicting the performance of visual tracking systems under varying textural conditions. Contemporary OTI methodologies are model-free, interpretable, and computationally efficient, aggregating texture responses either over semantic masks in 2D images or across multiple rendered viewpoints of 3D objects. Theoretical and empirical studies demonstrate OTI’s predictive power for attackability in machine learning and the stability of visual point tracking under both normal and complex motion.

1. Mathematical Definition and Formalism

The precise definition of OTI depends on the application domain. For 2D natural images and attackability analysis, OTI is formulated as the normalized sum of absolute high-frequency texture responses within the object of interest:

$\mathrm{OTI}(x) = \frac{1}{C\cdot H\cdot W} \left\| \mathrm{object}(x) \odot (f*x) \right\|_1$

where $x \in \mathbb{R}^{C\times H\times W}$ is an image, $\mathrm{object}(x)$ is a binary mask indicating the semantic object, $f*x$ is the response to a high-frequency texture filter (such as Sobel), and $\odot$ denotes elementwise multiplication. The sum is taken over all channels and spatial locations, and normalization is by the total number of elements in $x$ (Liang et al., 24 Jan 2026).

For 3D objects and the analysis of video frame tracking, OTI is operationalized through a set of five complementary metrics computed on patches across multi-view renders. These descriptors encompass:

Gray-Level Co-occurrence Matrix (GLCM) statistics: homogeneity, energy, correlation, contrast, dissimilarity
Local Binary Pattern (LBP) histogram: entropy, variance
Keypoint counts (ORB/FAST)
RGB channel variance
Fourier energy partitioning into low/high frequency components

Aggregated values are assigned discrete low/medium/high texture labels via histogram binning and majority vote (Huang et al., 17 Mar 2025).

2. Computation Pipeline and Algorithmic Steps

OTI for Image Attackability

The canonical pipeline for computing OTI in adversarial vulnerability analysis involves:

Semantic Mask Acquisition: Obtain $\mathrm{object}(x)$ using semantic segmentation, saliency detection, thresholded Grad-CAM maps, or manual annotation.
High-Frequency Filtering: Convolve $x$ with a filter $f$ (e.g., Sobel operator) to highlight local texture.
Masking: Zero the filter response outside the object region.
Aggregation: Compute the $\ell_1$ norm over the masked filter response.
Normalization: Divide by the total number of elements ( $C \cdot H \cdot W$ ), yielding a scalar OTI.

The entire procedure operates in $O(C\,H\,W)$ time and is highly amenable to parallelized hardware execution. The resulting OTI score can be interpreted as a heatmap and is inherently visually inspectable (Liang et al., 24 Jan 2026).

OTI for 3D Object Texture Classification

For 3D point tracking assessment, the procedure includes:

Rendering each 3D model from multiple ( $K$ ) viewpoints.
Tiling each view into patches or grids.
Computing the full suite of OTI metrics on each patch.
Averaging per-patch features across views to obtain model-level descriptors.
Binning each metric’s global distribution (over all models) into low/mid/high quantiles (3:4:3 split).
Assigning the final OTI label to each model by majority vote across its five metric bins; ties are broken in favor of the stronger (higher-intensity) label (Huang et al., 17 Mar 2025).

3. Theoretical Foundations and Interpretability

From the adversarial perspective, OTI connects to the geometry of classifier decision boundaries and to frequency analysis of adversarial perturbations:

Decision-boundary View: Under regularity assumptions, margin to the boundary, $d(x)$ , is approximately a linear combination of object area and global texture intensity:

$d(x) \approx \alpha\,\mathrm{OAR}(x) + \beta\,\mathrm{ITI}(x)$

Hence, samples with lower OTI are expected to be closer to the boundary and more susceptible to small adversarial perturbations, while high-OTI images are more robust (Liang et al., 24 Jan 2026).

Frequency-domain View: Adversarial noise preferentially occupies mid- and high-frequency bands; objects with intrinsically strong high-frequency content (higher OTI) can "mask" or absorb noise more effectively. OTI thus serves as an effective mid/high-frequency energy quantifier, in line with empirical observations in both attack and tracking contexts (Liang et al., 24 Jan 2026, Huang et al., 17 Mar 2025).

OTI is model-free and interpretable, producing direct spatial localization of vulnerability and texture features in contrast to model-dependent or opaque feature vectors.

4. Empirical Validation and Quantitative Benchmarks

Image Attackability Experiments

Comprehensive experiments on ImageNet-50K and Kvasir-SEG evaluate OTI under untargeted/targeted transfer-based and black-box query attacks, including attacks against adversarially trained models.

Key findings using ranked subsamples ( $\alpha=10\%$ ):

Single-surrogate untargeted attacks: +13.39% attack success rate (ASR) over random.
Ensemble-based untargeted: +13.55% ASR.
Single-surrogate targeted: +6.79% ASR.
Attacking adversarially trained models: +12.24% ASR.
Query-based: OTI selection achieves desired attacks with substantially lower perturbation norms.
In segmentation tasks, ranking by OTI results in larger drops in F1/IoU for the most vulnerable samples (Liang et al., 24 Jan 2026).

Visualizations show that low-OTI images typically have small/blurred objects, while high-OTI images present large, sharply textured targets. OTI heatmaps confirm bright regions around strong texture, with Grad-CAM overlays of adversarially attacked images aligning attack focus to low-OTI zones.

Point Tracking in Video and 3D Benchmarks

On the GIFT benchmark, tracking accuracy degrades monotonically as OTI shifts from high to low:

Texture Intensity	δ₍avg₎^{vis} (%)	AJ Score
Low (N-L)	56.2	46.5
Medium (N-M)	63.0	52.1
High (N-H)	69.0	56.6

All evaluated point-tracking and optical-flow baselines demonstrate a strong positive correlation between texture strength (OTI) and tracking robustness, with low-OTI models typically suffering 10–15 percentage point deficits in accuracy relative to high-OTI cases (Huang et al., 17 Mar 2025).

5. Applications in Adversarial Robustness and Vision

OTI is leveraged for:

Active Learning: Prioritizing annotation or training of low-OTI samples (i.e., those most attackable) to enhance dataset robustness.
Adversarial Training: Selecting low-OTI candidates for adversarial augmentation, focusing defense computational resources where margin is smallest.
Black-box Attack Enhancement: Allocating attack queries preferentially to lowest-OTI images to maximize success rates under query budget constraints.
Predictive Assessment in Tracking: Using OTI bins to forecast loss of point tracking fidelity across scene categories and trajectories (Liang et al., 24 Jan 2026, Huang et al., 17 Mar 2025).

Practically, practitioners are advised to use reliable segmentation/saliency networks for object masking, compute OTI metrics in a one-time offline process, and integrate OTI into hybrid vulnerability assessment or defense pipelines.

6. Assumptions, Limitations, and Domain Extensions

OTI validity is contingent on:

Segmentation Quality: Inaccurate object masks can introduce considerable bias into OTI assessments.
Filter Choice: The canonical Sobel operator approximates mid/high-frequency content but may not capture all frequency nuances; domain-specific or more sophisticated filter banks can enhance fidelity.
Modality Restriction: Current OTI formulations are for images and image-objects; extension to non-visual data (e.g., audio, text) would require analogous texture descriptors.
Discrete Labeling in 3D: In the GIFT protocol, low/mid/high OTI categorization is based on empirical quantile splits, with majority-vote aggregation; the final scalar OTI may behave differently across datasets (Liang et al., 24 Jan 2026, Huang et al., 17 Mar 2025).

7. Contextual Significance and Future Directions

Object Texture Intensity (OTI) has become a standard tool in two distinct but related research areas: assessing adversarial vulnerability in image classification/segmentation systems, and benchmarking texture-governed performance effects in robust point tracking. OTI bridges low-level signal analysis with high-level model behavior, offering a transparent, computationally light, and semantically grounded approach to quantifying the role of object texture in modern vision systems.

A plausible implication is the extension of OTI-style metrics to multi-modality, more complex semantic hierarchies, or dynamic video streams, where temporal and spatial textural coherence may further influence system performance.

References:

OTI for attackability and interpretable defense/attack selection: (Liang et al., 24 Jan 2026) OTI multi-metric protocol for 3D/tracking: (Huang et al., 17 Mar 2025)

Markdown Report Issue Upgrade to Chat

References (2)

OTI: A Model-free and Visually Interpretable Measure of Image Attackability (2026)

GIFT: Generated Indoor video frames for Texture-less point tracking (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Object Texture Intensity (OTI).