Frequency-Dominant Neighborhood Structure (F-DNS)
- F-DNS is a unifying framework that captures frequent local patterns in graphs and images by analyzing frequency characteristics of neighborhoods.
- It employs BFS-based candidate generation and degree-frequency histograms for graphs, alongside DCT-driven perceptual hashing in images.
- Experimental results show scalable graph mining, robust invariance in image hashing, and effective node feature extraction for various ML tasks.
Frequency-Dominant Neighborhood Structure (F-DNS) represents a unifying framework for extracting and encoding dominant local patterns in both graph-structured data and images. By focusing on the frequency characteristics of local neighborhoods—whether these are graph-theoretic r-neighborhoods, neighbor-degree histograms, or spatial-frequency domains—F-DNS enables efficient pattern mining, robust feature hashing, and local-to-global inference across heterogeneous data modalities.
1. Formal Definitions and Core Mathematical Structures
Across literature, F-DNS takes distinct but conceptually related forms:
A. Graph Mining (Single-Graph Setting)
F-DNS is formalized as the set of all frequent r-neighborhood patterns in a single labeled graph , with:
- An r-neighborhood induced over , with edges as in and a designated pivot .
- A neighborhood pattern is matched to if there exists an injective pivoted subgraph isomorphism , preserving vertex and edge labels and mapping the pivot to .
- The support of is , where (Han et al., 2013).
B. Graph Embeddings via Neighbor-Degree Frequency
F-DNS is instantiated as histograms or matrices reflecting the frequencies of neighbor degrees up to a given BFS depth:
- The (vanilla/minimal/dynamic) NDF vector encodes, for , the counts of immediate neighbors with various degrees, optionally binned by intervals for dynamic graphs.
- Higher-order structures aggregate these frequencies at increasing BFS radii and may be normalized, forming the NDFC or CDF matrices (Shirbisheh, 2022).
C. Perceptual Hashing in Images
F-DNS constitutes a global feature vector that aggregates local dominant frequency similarity patterns:
- The image is transformed via the 2D Discrete Cosine Transform, .
- Over each window in frequency space, the dominant frequency structure is captured by computing, at each central coefficient , the Euclidean distance between patches (center and neighbor) of size .
- Summing these local maps over the frequency domain and aggregating yields the F-DNS hash, a vector of (typically, , ) (Biswas et al., 2020).
2. Algorithmic Frameworks and Computational Properties
Frequent Neighborhood Pattern Mining in Graphs
The mining of F-DNS proceeds via an Apriori-style, BFS-based enumeration:
- Candidate Generation: Start with all small, frequent “building block” patterns—paths pivoted at one end and up to the radius bound.
- Pattern Joining and Pruning: For each size , candidate patterns are generated by joining pairs of size- frequent patterns. Candidates whose every subpattern is not frequent are discarded, in accordance with the downward-closure property (DCP): if is a subpattern of , .
- VID-list Optimization: For each pattern, maintain the list of matching vertices (VID-list) to speed up support computation (by intersecting candidate lists). This yields up to speedup in join-and-verify steps (Han et al., 2013).
Local Graph Embedding and Isomorphism Testing
Feature extraction by BFS to depth centered at each node :
- Step 1: Compute the NDF vector as degree-frequency bins over .
- Step 2: For , compute mean neighbor-degree frequencies over the -th BFS “circle” (NDFC) or raw frequencies (CDF).
- Step 3: Stack these as row vectors to construct node-specific matrices for downstream ML or isomorphism refinement.
- Complexity: For radius and average degree , work is ; all steps use adjacency lists (no matrix assembly needed) and are highly parallelizable (Shirbisheh, 2022).
Image Perceptual Hashing
The F-DNS hash algorithm consists of:
- Preprocessing: Convert to greyscale, apply Gaussian smoothing.
- DCT Computation: Compute over the entire preprocessed image.
- Sliding Window Feature Extraction: For each frequency coefficient, extract central and neighboring patches; compute pairwise Euclidean distances.
- Aggregation: Sum all local F-DNS maps to produce a global signature vector (e.g., 64D for ).
- Similarity: Pearson correlation of F-DNS hashes is used for matching; classification is template-driven and non-parametric (Biswas et al., 2020).
3. Semantic and Theoretical Significance
A. Local-to-Global Inference in Graphs
- F-DNS captures the “local topology” around graph vertices, summarizing how many vertices share a particular labeled, topological pattern (e.g., “authors with at least two papers,” “self-citation cycles”).
- In single-graph settings, counting the frequency/proportion of pivots matching a local pattern provides a richer, more informative support measure than the traditional “exists/does not exist” used in subgraph mining (Han et al., 2013).
B. Isomorphism and Centrality
- The multilevel degree-frequency histograms underlying F-DNS can distinguish many pairs of non-isomorphic graphs, in some cases where 1-WL color refinement fails (Shirbisheh, 2022).
- Parametric centrality families derived from BFS exploration, aggregating “circle sizes” with -exponential weights, yield features closely tracking classic measures like closeness and PageRank.
C. Perceptual Robustness in Images
- F-DNS hashes provide invariance to content-preserving transforms, especially geometric transformations (rotation, scaling) and various noise operations.
- The DCT basis allows for a compact separation of informative (high-energy) and less informative (low-energy) spatial components, enabling robust recognition, even across significant distortions (Biswas et al., 2020).
4. Experimental Evidence and Quantitative Results
Graph Mining (Han et al., 2013)
- On EntityCube (M, , ), F-DNS mining scales efficiently using minimum support thresholds as low as 0.0001.
- VID optimizations yield over an order-of-magnitude candidate reduction and 80% reduction in per-candidate verification time.
- On ArnetMiner, size-4 neighborhood patterns are mined in under a minute, finding 1,000 significant patterns for author pivots.
- Patterns include “author writes ≥2 papers” (support 31.4% of authors), “conference accepts ≥2 papers from same author” (25.4%), and cyclic/co-authorship motifs (up to 10% of all patterns).
Graph Embeddings (Shirbisheh, 2022)
- Flattened NDFC matrices input to shallow feed-forward neural nets achieve 90–98% accuracy in predicting PageRank and closeness centrality, with accuracy maintained under random edge perturbations and on unseen graphs.
- No global matrix factorization or solve required; models are lightweight (4–6 layers, minutes of training).
Image Perceptual Hashing (Biswas et al., 2020)
- On standard image and web page screenshot datasets, F-DNS achieves Pearson correlation under all perturbations except rotation (), outperforming RP-IVD ( under rotation).
- On DUSI-2K (2,500 Tor screenshots, 16 categories), F-DNS hashing with a template-based classifier achieves 98.75% accuracy, exceeding RP-IVD (95.84%) and Inception-ResNet-v2 (85.19%).
5. Strengths, Limitations, and Extensions
Strengths
- F-DNS encodes local structural regularities that are highly informative for tasks ranging from graph pattern mining and node embedding to robust perceptual hashing.
- The support measure in F-DNS preserves the DCP, enabling efficient candidate pruning and scalable algorithms.
- In perceptual hashing, the resulting features maintain high discrimination with low dimensionality (e.g., 64 floats) and enable non-parametric classification without extensive training (Biswas et al., 2020).
- NDF-based embeddings provide transferrable, inductive node features suitable for dynamic and evolving graphs, requiring only local exploration (Shirbisheh, 2022).
Limitations
- In hashing, real-valued F-DNS descriptors necessitate floating-point storage and matching; binary quantization (not attempted in (Biswas et al., 2020)) could offer further compactness and speed.
- For large images, the sliding window computation, while linear, can be computationally intensive—multi-resolution analysis or keypoint prioritization could address this (Biswas et al., 2020).
- In graph mining, worst-case exponential isomorphism checks may be required but are heavily mitigated by locality and pruning (Han et al., 2013).
Potential Extensions
- Binarization and use of locality-sensitive hashing for rapid nearest neighbor search or database indexing in perceptual applications.
- Substitution of the DCT with other frequency decompositions (e.g., wavelets) for domains where local stationarity does not hold (Biswas et al., 2020).
- Restricting F-DNS computation to salient keypoints or high-distinctiveness regions in images to improve computational efficiency.
6. Applications and Broader Impact
F-DNS underpins a variety of practical and theoretical advancements:
- Graph Pattern Discovery: Enables the mining of frequent, interpretable motifs in knowledge graphs and citation networks (e.g., self-citation cycles, author-venue reuse) with direct semantic interpretation (Han et al., 2013).
- Graph Isomorphism and Node Feature Learning: Supplies a suite of local descriptors for isomorphism testing and accurate regression/classification of node-level graph-theoretic quantities using simple ML models (Shirbisheh, 2022).
- Image Similarity and Classification: Provides a robust, template-driven mechanism for classification of web screenshots, including obfuscated or variably rendered Tor domains, with state-of-the-art invariance to preservation edits (Biswas et al., 2020).
A plausible implication is that the local, frequency-dominant perspective—whether via BFS-driven neighborhood statistics or frequency-domain analysis—captures the essence of recurring structure across disparate data types, rendering F-DNS a foundational concept for feature extraction, pattern recognition, and efficient large-scale data mining.