Online Group Feature Selection
- Online Group Feature Selection (OGFS) is a method that selects relevant feature groups from a data stream by leveraging group structure to maximize between-class variance and minimize within-class variance.
- It employs a two-stage process using spectral analysis for intra-group selection and Lasso regression to eliminate redundant features across groups.
- OGFS has demonstrated robust performance in image recognition and bioinformatics, showing stability under varying group arrival orders and improving classification accuracy with fewer features.
Online Group Feature Selection (OGFS) refers to a class of algorithms specifically designed for selecting relevant features from a dynamically arriving data stream, where features arrive in groups rather than individually. OGFS maintains the structural integrity of feature groups, such as those encountered when multi-modal or multi-descriptor features are computed in batches (e.g., SIFT descriptors in images, sensor arrays, biological pathway-based genes). The OGFS methodology emphasizes discriminative power and redundancy control under real-time or streaming constraints, utilizing principled statistical and optimization criteria as new groups are acquired and incrementally incorporated into the selected subset (Wang et al., 2016, Jing et al., 2014).
1. Problem Setting and Formalization
Let denote the data matrix with instances and total features, and the class labels. In OGFS, features are not assumed to be available at once; instead, they arrive as a sequence of groups, at each timestep . Each is a feature vector of length . The algorithm, unware of future groups, incrementally updates the selected subset where . Typical stopping criteria include reaching a target cardinality , achieving a desired classification accuracy on , or exhausting the feature stream.
The selection objective is threefold:
- Maximize between-class variance,
- Minimize within-class variance,
- Reduce redundancy among selected features across all groups.
2. Intra-Group Feature Selection via Spectral Analysis
At each group arrival, OGFS applies a spectral criterion to decide, in an online manner, whether a feature should be provisionally included. Affinity matrices (between-class) and (within-class) are constructed as follows:
With degree matrices and Laplacians , , for any current feature set , the spectral ratio is defined as: where is the submatrix of restricted to features selected in .
For each incoming feature , the increment is evaluated. If (e.g., ), is included in the provisional intra-group set . This criterion enforces the selection of features that enhance between-class discrimination or compact within-class scatter as measured spectrally (Wang et al., 2016, Jing et al., 2014).
3. Inter-Group Feature Selection via Sparse Regression
Following the intra-group pass, OGFS seeks to further prune redundancy both within and across groups using Lasso regression. For the current selection and interim group , an augmented feature matrix is constructed from . The algorithm solves:
Features corresponding to nonzero entries in are retained; features with zero coefficient are discarded. This step leverages the sparsity-inducing penalty to encourage a compact, non-redundant selection for subsequent group arrivals (Wang et al., 2016, Jing et al., 2014).
4. Algorithmic Summary and Complexity
The overall OGFS pipeline consists of repeated, two-stage processing for each arriving group:
- Intra-group selection: For each feature in group , compute and include in if the increment exceeds .
- Inter-group Lasso: Apply sparse regression over ; keep only those features with nonzero regression weights, updating .
- Stopping check: Terminate if the selected feature set reaches cardinality , achieves the desired accuracy, or the stream ends.
The intra-group computation involves efficient eigen-trace updates of the spectral ratio, per feature with small, while the inter-group Lasso regression is empirically linear or low polynomial in current feature count. Over groups, with features per group, overall time complexity is and space is governed by and Laplacian storage (Wang et al., 2016, Jing et al., 2014).
5. Experimental Results and Empirical Characteristics
OGFS has been benchmarked on both image and biological data where group structure is natural or synthesized:
- Image benchmarks:
- Cifar-10 (color image classification): OGFS achieved 49.6% accuracy with ≈2k features; using only intra-group step achieves 51.2% (≈5k features), offering a controllable trade-off.
- Caltech-101 (object recognition): Outperformed all online baselines by an absolute 6–13% margin with 1–2k features.
- LFW (face verification): Achieved ~81% accuracy versus 77% for Grafting and 66% for OSFS, using ~1.5k features.
- UCI/microarray benchmarks:
- OGFS surpassed other online methods on 6/8 tasks and achieved comparable or better accuracy than offline LARS/MI selection with fewer features.
- Robustness: OGFS exhibited empirical stability under random permutation of group arrival order, with accuracy variance 1% (Wang et al., 2016, Jing et al., 2014).
Compared baselines included Alpha-investing, OSFS, Grafting (online) and mutual information, LARS, GBFS (offline). Metrics tracked were accuracy, number of features (compactness), and CPU time.
6. Theoretical Analysis and Guarantees
No formal convergence or regret-type guarantees are provided for OGFS. However, empirical evaluation demonstrates algorithmic stability concerning group order and efficacy in both discriminative power and redundancy pruning. The approach systematically combines local spectral discriminability with global sparsity control at each group addition. The performance and output subset depend explicitly on group quality and the parameters and , which require tuning (Wang et al., 2016, Jing et al., 2014).
7. Applications, Limitations, and Extensions
OGFS is applicable where feature groups arrive dynamically or possess inherent structure: multi-descriptor visual analysis, sensor networks with grouped measurements, bioinformatics with grouped genes, or any streaming data pipeline where "early, small-footprint" discriminative models are needed. The method assumes meaningful group boundaries and requires and selection for spectral and Lasso steps. There is no theoretical guarantee of global optimality. A plausible implication is that further advances in theoretical understanding and adaptive parameter selection could extend OGFS to domains with weaker or less defined structural grouping.
References
- "Online Feature Selection with Group Structure Analysis" (Wang et al., 2016)
- "Online Group Feature Selection" (Jing et al., 2014)