Semi-supervised Feature Analysis by Mining Correlations among Multiple Tasks (1411.6232v2)

Published 23 Nov 2014 in cs.LG

Abstract: In this paper, we propose a novel semi-supervised feature selection framework by mining correlations among multiple tasks and apply it to different multimedia applications. Instead of independently computing the importance of features for each task, our algorithm leverages shared knowledge from multiple related tasks, thus, improving the performance of feature selection. Note that we build our algorithm on assumption that different tasks share common structures. The proposed algorithm selects features in a batch mode, by which the correlations between different features are taken into consideration. Besides, considering the fact that labeling a large amount of training data in real world is both time-consuming and tedious, we adopt manifold learning which exploits both labeled and unlabeled training data for feature space analysis. Since the objective function is non-smooth and difficult to solve, we propose an iterative algorithm with fast convergence. Extensive experiments on different applications demonstrate that our algorithm outperforms other state-of-the-art feature selection algorithms.

Citations (227)

View on Semantic Scholar

Collections

Summary

The paper introduces SFMC, a novel framework that combines semi-supervised learning with multi-task feature selection to identify informative features.
It employs manifold learning, Laplacian regularization, and l2,1-norm sparsity to efficiently address noisy and redundant features in high-dimensional spaces.
Extensive experiments on multimedia and motion datasets show improved MAP scores, demonstrating practical scalability and robustness in limited-label scenarios.

Semi-supervised Feature Analysis by Mining Correlations among Multiple Tasks

The paper introduces a novel framework for feature selection that is built upon semi-supervised learning and is designed to exploit the shared structure among multiple related tasks. This approach addresses some of the inherent challenges prevalent in high-dimensional data spaces, such as the existence of noisy or redundant features, by jointly considering both labeled and unlabeled data.

The proposed framework, termed Semi-supervised Feature Analysis by Mining Correlations among Multiple Tasks (SFMC), represents a significant development in feature selection algorithms. Traditional methods often evaluate feature importance in isolation, ignoring potential correlations between features. Furthermore, another limitation of existing approaches is that they select features independently for each task, which precludes the opportunity to leverage inter-task relationships. The SFMC addresses these concerns by incorporating principles from both semi-supervised learning and multi-task feature selection.

Methodology

The developed methodology is structured around a regularized framework. The model employs manifold learning to incorporate both labeled and unlabeled data in a cohesive manner. The objective function integrates a Laplacian regularization, which captures the manifold structure of data, and is complimented by $l_{2,1}$ -norm regularization that enforces sparsity in the feature selection matrix. Additionally, a trace norm regularization term is included to encapsulate shared information across related tasks, facilitating effective transfer learning.

Given the non-smooth nature of the resulting optimization problem, the authors propose an iterative algorithm designed to efficiently converge to an optimal solution. This algorithm leverages fast iterative updates founded on mathematical theorems to ensure convergence over few iterations, alleviating computational costs and enabling practical scalability.

Experimental Evaluation

Comprehensive experiments validate the effectiveness of SFMC, spanning multiple domains such as video classification, image annotation, human motion recognition, and 3D motion data analysis. The proposed algorithm consistently outperforms current state-of-the-art methods in scenarios with varying percentages of labeled data—highlighting the framework's robustness when training data is scarce. The authors provide detailed numerical results, illustrating superior Mean Average Precision (MAP) scores across multiple datasets—including CCV, NUS-WIDE, HMDB, and HumanEva—when compared against baseline approaches.

Implications and Future Directions

The SFMC framework presents significant implications for practical applications where labeled data is limited or expensive to obtain. The ability to harness unlabeled data for improved feature selection performance can greatly enhance tasks such as multimedia annotation and complex data analyses in scientific research.

Theoretically, this paper catalyzes future inquiry into the development of algorithms that can further exploit inter-task dependencies in a more refined manner. As computational resources continue to grow and more sophisticated models emerge, deeper integration of multi-task learning principles could yield even more granular insight into shared structural patterns across tasks, potentially extending to other domains such as text analysis, bioinformatics, and social networks.

In conclusion, the combination of semi-supervised learning and multi-task feature selection offers a promising direction for the development of feature selection algorithms in high-dimensional contexts. Future research may explore more nuanced regularization techniques while examining the broad applicability of SFMC across diverse machine learning and data mining applications.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now