Static Physics-Informed BoF for SPM Imaging
- The static physics-informed BoF approach converts AFM/MFM images into fixed-length histograms using a learned dictionary of physical words.
- It integrates descriptor calculations and energy-based weighting to capture nanoscale morphology and magnetic structures with high robustness.
- The method serves as a bridge to autonomous multi-objective Bayesian optimization by mapping high-dimensional imaging data to interpretable surrogates.
The static physics-informed Bag-of-Features (BoF) representation is a computational methodology for converting atomic force microscopy (AFM) and magnetic force microscopy (MFM) images of combinatorial materials libraries into robust, interpretable, fixed-length feature vectors that encode physicochemically meaningful information. This approach enables quantitative, multi-objective structure–property analysis and autonomous exploration of complex materials landscapes by serving as the interface between high-dimensional imaging data and advanced optimization frameworks, such as multi-objective Bayesian optimization (MOBO) (Barakati et al., 9 Jan 2026).
1. Foundational Principles and Motivation
The static physics-informed BoF representation models an SPM image as a histogram over “physical words,” which are learned dictionary elements representing local patterns in the real/phase space of the image. Given the space of local patch descriptors , a dictionary is constructed so that each image is encoded as a feature vector , where
and is the count (or weighted count) of patches most similar to physical word . This encoding is translation-invariant and robust to scan offsets, resolution changes, and local noise, as it discards absolute spatial coordinates and aggregates local statistics, fulfilling key requirements for SPM data analysis.
The static property indicates that the BoF feature vector for a given image is fixed upon extraction, providing reproducible input for surrogate modeling and acquisition functions in MOBO. By employing physically motivated dictionaries and patch descriptors, the representation encapsulates essential nanoscale morphology and magnetic structure, directly supporting the optimization of competing objectives such as roughness, domain size, and contrast.
2. Feature Extraction Pipeline
The pipeline consists of sequential steps that map raw AFM/MFM images to collections of local descriptors and, subsequently, to the BoF histogram:
- Preprocessing: Each image is first flattened via plane subtraction
where minimizes least-squares error, followed by denoising using a Gaussian filter with kernel .
- Patch Extraction: Filtered images are divided into overlapping patches , each yielding a descriptor .
- Descriptor Calculation: Five classes of features, encapsulated in Table 1 of the cited work, include:
- Root-mean-square roughness and height-distribution moments for AFM,
- Autocorrelation-derived correlation length ,
- Voronoi-based mean particle diameter ,
- MFM domain size (via peak frequency of the power spectrum),
- Domain magnitude/contrast .
These descriptors are calculated according to physically and statistically justified formulas, e.g., for roughness:
3. Bag-of-Features Model Construction
BoF encoding consists of three main stages:
- Dictionary Learning: All patch descriptors are pooled across the dataset and clustered (typically by -means) to identify representative atoms .
- Assignment: Each descriptor is assigned either via hard nearest-neighbor or soft Gaussian-weighted assignment:
or
- Histogram Encoding: For each image, is computed for .
A plausible implication is that the histogram quantifies the abundance and diversity of distinct local patterns in the image, thereby reducing thousands of pixel values to a tractable set of interpretable features.
4. Physics-Informed Constraints and Weighting
To align BoF features with underlying physical phenomena, patch assignment weights are modulated by energy-based quantities:
- Exchange-Energy Weighting (MFM):
where is local magnetization and the exchange stiffness.
- Surface-Energy Weighting (AFM):
with surface tension .
The average weight over each patch , , is incorporated such that . This emphasizes domains that are physically relevant, modulating the dictionary-based encoding by the energetic landscape at the nanoscale.
5. Dimensionality Reduction and Normalization
After histogram construction, normalization and dimensionality reduction are performed:
- normalization:
- normalization (optional):
- Principal Component Analysis (PCA): Applied to the ensemble to obtain compact representations , facilitating efficient optimization in reduced feature space.
This step addresses feature scaling and redundancy, ensuring that the surrogate models operate on statistically stable, non-degenerate inputs.
6. Integration with Multi-Objective Bayesian Optimization
BoF vectors serve as inputs for surrogate models and acquisition functions in a MOBO workflow:
- Surrogates: For each objective (e.g., , , , ), a Gaussian process is fitted.
- Acquisition: Candidate compositions are scored by the batch q-Expected Hypervolume Improvement (qEHVI) criterion:
where is the dominated hypervolume.
- Closed-Loop Loop: Iterative selection, measurement, encoding, and update steps allow rapid mapping of feature landscapes with minimal acquisition budget.
This approach turns static imaging data into actionable feedback, actively steering experimental exploration of combinatorial materials spaces.
7. Pareto-Front Mapping and Autonomous Discovery
Pareto mapping quantifies trade-offs among competing objectives at the feature level:
- Pareto Dominance: A vector dominates if and .
- Front Construction: The Pareto front is maintained, with non-dominated solutions updated as new measurements are acquired.
- Hypervolume Indicator: is used for acquisition targeting, measuring the multi-objective coverage.
This methodology enables interpretable mapping of structure–property trends, identification of clusters and trade-off regimes, and selection of optimal candidate compositions within high-dimensional feature landscapes.
8. End-to-End Workflow Summary
The static physics-informed BoF approach is operationalized in autonomous probe microscopy workflows according to the following structured algorithm:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
1. Offline:
• Acquire AFM/MFM scans, preprocess (flatten, filter), extract descriptors {x_i}, learn dictionary {d_k}.
2. Initialize:
• Choose initial compositions {x_n}, perform measurement, extract BoF histograms h^{(n)}.
• Fit GP surrogates f_j(h).
3. Iterative Loop:
a) Acquisition: Optimize qEHVI, select candidates.
b) Measurement: Acquire, extract descriptors.
c) Encoding: Compute assignment, BoF histogram h^{(t)}.
d) Update: Augment data, retrain surrogates.
e) Pareto Update: Maintain non-dominated vectors.
f) (Optional) PCA/normalization refresh.
4. Termination: Stop when budget/convergence achieved.
5. Output: Report Pareto-optimal compositions and corresponding features. |
This approach generalizes beyond specific modalities or materials systems, as demonstrated for Au-Co-Ni combinatorial films, and offers extensibility to diverse imaging and feature sets. The static, physics-informed BoF methodology thus underpins closed-loop, multi-objective discovery frameworks and advances the interpretability and efficiency of autonomous SPM research (Barakati et al., 9 Jan 2026).