Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Penalized Linear Models for Highly Correlated High-Dimensional Immunophenotyping Data (2504.07771v2)

Published 10 Apr 2025 in stat.AP

Abstract: Accurate prediction and identification of variables associated with outcomes or disease states are critical for advancing diagnosis, prognosis, and precision medicine in biomedical research. Regularized regression techniques, such as lasso, are widely employed to enhance interpretability by reducing model complexity and identifying significant variables. However, when applying to biomedical datasets, e.g., immunophenotyping dataset, there are two major challenges that may lead to unsatisfactory results using these methods: 1) high correlation between predictors, which leads to the exclusion of important variables with included predictors in variable selection, and 2) the presence of skewness, which violates key statistical assumptions of these methods. Current approaches that fail to address these issues simultaneously may lead to biased interpretations and unreliable coefficient estimates. To overcome these limitations, we propose a novel two-step approach, the Bootstrap-Enhanced Regularization Method (BERM). BERM outperforms existing two-step approaches and demonstrates consistent performance in terms of variable selection and estimation accuracy across simulated sparsity scenarios. We further demonstrate the effectiveness of BERM by applying it to a human immunophenotyping dataset identifying important immune parameters associated the autoimmune disease, type 1 diabetes.

Summary

We haven't generated a summary for this paper yet.