Multi-Task Feature Learning Via Efficient l2,1-Norm Minimization (1205.2631v1)

Published 9 May 2012 in cs.LG, cs.CV, and stat.ML

Abstract: The problem of joint feature selection across a group of related tasks has applications in many areas including biomedical informatics and computer vision. We consider the l2,1-norm regularized regression model for joint feature selection from multiple tasks, which can be derived in the probabilistic framework by assuming a suitable prior from the exponential family. One appealing feature of the l2,1-norm regularization is that it encourages multiple predictors to share similar sparsity patterns. However, the resulting optimization problem is challenging to solve due to the non-smoothness of the l2,1-norm regularization. In this paper, we propose to accelerate the computation by reformulating it as two equivalent smooth convex optimization problems which are then solved via the Nesterov's method-an optimal first-order black-box method for smooth convex optimization. A key building block in solving the reformulations is the Euclidean projection. We show that the Euclidean projection for the first reformulation can be analytically computed, while the Euclidean projection for the second one can be computed in linear time. Empirical evaluations on several data sets verify the efficiency of the proposed algorithms.

Citations (725)

View on Semantic Scholar

Summary

The paper introduces reformulated optimization techniques that convert non-smooth 2,1-norm regularization into smooth convex problems using Nesterov’s method.
It demonstrates rapid convergence, solving multi-task learning problems in as few as 30 iterations compared to existing techniques.
Its approach offers practical advantages in fields like biomedical informatics and text classification by enabling scalable joint feature selection.

Multi-Task Feature Learning Via Efficient 2,1-Norm Minimization

In their paper, Liu, Ji, and Ye delve into the optimization framework for joint feature selection across multiple related tasks using the 2,1-norm regularization. This model is especially useful in fields such as biomedical informatics and computer vision where multiple predictors need to share similar sparsity patterns.

Abstract

The primary focus of the paper is to tackle the challenging optimization problem posed by the non-smooth nature of the 2,1-norm regularization. The authors propose efficient computation techniques by reformulating the original problem into two equivalent smooth convex optimization problems. These are then solved using Nesterov’s method, which is known for its optimal performance in smooth convex optimization scenarios.

Introduction

Multi-task learning aims to leverage the shared information among related tasks to achieve improved overall performance. The 2,1-norm regularization approach is particularly appealing because it encourages multiple predictors to exhibit similar sparsity patterns, which can be advantageous in various applications like medical diagnostics and text classification.

Problem Formulation

The core issue addressed involves formulating the 2,1-norm regularized regression model for joint feature selection across multiple tasks. This model can be derived within a probabilistic framework by assuming an appropriate exponential family prior. However, the optimization problem posed is non-trivial due to the non-smooth nature of the 2,1-norm regularization term.

Proposed Solutions

The authors propose to reformulate the non-smooth optimization problem into two smooth convex optimization problems:

First Reformulation (aMTFL1): This involves introducing additional variables to transform the non-smooth term. The key advantage here is that the Euclidean projection required can be computed analytically in linear time.
Second Reformulation (aMTFL2): This method moves the nonsmooth term into the constraints, forming a 2,1-ball constrained smooth convex optimization problem. While the Euclidean projection is more complex, it can still be done efficiently.

Both reformulations are then tackled using Nesterov’s method, which requires evaluating the function value and (sub)gradient at each iteration but benefits from a superior convergence rate for smooth convex functions.

Empirical Evaluation

The authors validate their approach through empirical evaluations on datasets like School and Letter. Their results demonstrate that the proposed methods converge quickly, often within 30 iterations, and outpace existing methods like MTL-FEAT and gradient descent in computational efficiency.

Practical and Theoretical Implications

Practically, the proposed methods enable efficient and scalable joint feature selection in multi-task learning scenarios, making them highly suitable for large-scale applications in fields such as medical diagnosis and text classification. Theoretically, the reformulation techniques presented could be extended to other non-smooth regularization problems, broadening the scope of efficient optimization methods in machine learning.

Conclusions and Future Work

The paper concludes that the proposed methods offer an efficient solution to the 2,1-norm regularized multi-task learning problem. Future work could involve exploring adaptive line search methods and comparing the proposed algorithms with coordinate gradient descent methods, aiming to further enhance practical performance and extend applicability to real-world problems.

Summary

Liu, Ji, and Ye present a comprehensive paper of the 2,1-norm regularized multi-task feature learning problem and offer efficient computational techniques to handle its inherent non-smoothness. Their innovative reformulations and the use of Nesterov’s method significantly enhance computational efficiency, providing both practical and theoretical advancements in the optimization landscape of multi-task learning.

PDF Markdown

Multi-Task Feature Learning Via Efficient l2,1-Norm Minimization (1205.2631v1)

Summary

Multi-Task Feature Learning Via Efficient 2,1-Norm Minimization

Abstract

Introduction

Problem Formulation

Proposed Solutions

Empirical Evaluation

Practical and Theoretical Implications

Conclusions and Future Work

Summary

Related Papers