Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Stratified Learning: A General-Purpose Statistical Method for Improved Learning under Covariate Shift (2106.11211v2)

Published 21 Jun 2021 in stat.ML, astro-ph.CO, and cs.LG

Abstract: We propose a simple, statistically principled, and theoretically justified method to improve supervised learning when the training set is not representative, a situation known as covariate shift. We build upon a well-established methodology in causal inference, and show that the effects of covariate shift can be reduced or eliminated by conditioning on propensity scores. In practice, this is achieved by fitting learners within strata constructed by partitioning the data based on the estimated propensity scores, leading to approximately balanced covariates and much-improved target prediction. We demonstrate the effectiveness of our general-purpose method on two contemporary research questions in cosmology, outperforming state-of-the-art importance weighting methods. We obtain the best reported AUC (0.958) on the updated "Supernovae photometric classification challenge", and we improve upon existing conditional density estimation of galaxy redshift from Sloan Data Sky Survey (SDSS) data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Maximilian Autenrieth (6 papers)
  2. David A. van Dyk (40 papers)
  3. Roberto Trotta (51 papers)
  4. David C. Stenning (21 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.