Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving variable selection properties by leveraging external data (2502.15584v2)

Published 21 Feb 2025 in math.ST, stat.ME, and stat.TH

Abstract: Sparse high-dimensional signal recovery is only possible under certain conditions on the number of parameters, sample size, signal strength and underlying sparsity. We show that leveraging external information, as possible with data integration or transfer learning, allows to push these mathematical limits. Specifically, we consider external information that allows splitting parameters into blocks, first in a simplified case, the Gaussian sequence model, and then in the general linear regression setting. We show how external information dependent, block-based, $\ell_0$ penalties attain model selection consistency under milder conditions than standard $\ell_0$ penalties, and they also attain faster model recovery rates. We first provide results for oracle-based $\ell_0$ penalties that have access to perfect sparsity and signal strength information. Subsequently, we propose an empirical Bayes data analysis method that does not require oracle information and for which efficient computation is possible via standard MCMC techniques. Our results provide a mathematical basis to justify the use of data integration methods in high-dimensional structural learning.

Summary

We haven't generated a summary for this paper yet.