On the definition of a confounder
Abstract: The causal inference literature has provided a clear formal definition of confounding expressed in terms of counterfactual independence. The literature has not, however, come to any consensus on a formal definition of a confounder, as it has given priority to the concept of confounding over that of a confounder. We consider a number of candidate definitions arising from various more informal statements made in the literature. We consider the properties satisfied by each candidate definition, principally focusing on (i) whether under the candidate definition control for all "confounders" suffices to control for "confounding" and (ii) whether each confounder in some context helps eliminate or reduce confounding bias. Several of the candidate definitions do not have these two properties. Only one candidate definition of those considered satisfies both properties. We propose that a "confounder" be defined as a pre-exposure covariate C for which there exists a set of other covariates X such that effect of the exposure on the outcome is unconfounded conditional on (X,C) but such that for no proper subset of (X,C) is the effect of the exposure on the outcome unconfounded given the subset. We also provide a conditional analogue of the above definition; and we propose a variable that helps reduce bias but not eliminate bias be referred to as a "surrogate confounder." These definitions are closely related to those given by Robins and Morgenstern [Comput. Math. Appl. 14 (1987) 869-916]. The implications that hold among the various candidate definitions are discussed.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What this paper is about (big picture)
This paper tries to answer a simple-sounding question that turns out to be tricky: What exactly is a confounder? In studies that look for causes (like “Does exercise reduce heart disease?”), a confounder is a factor that can mix up cause and effect. The authors show that some common ways people define “confounder” don’t always work, and they propose a clear, practical definition that matches how scientists actually use the term.
The main goal and questions
The authors ask:
- Can we give a precise, mathematical definition of a “confounder” that matches everyday scientific use?
- Which possible definitions make sure that: 1) If you control for all confounders, you truly remove the problem called confounding. 2) Each confounder really helps reduce or fully remove the bias (the error) in your estimate of a cause-and-effect relationship.
How they approached the problem (methods in simple terms)
To study “confounders,” the authors used two main tools:
- Counterfactuals (what-if thinking): Imagine the same person in two worlds—one where they get the exposure (like a medicine) and one where they don’t—and compare the outcomes. Confounding is present if who gets the exposure is related to what their outcome would have been in the “what-if” worlds.
- Causal diagrams (arrow maps): Draw variables as dots and draw arrows to show what causes what. These maps help spot “backdoor paths,” which are sneaky routes by which non-causal factors can make it look like there’s a cause when there isn’t.
They collected several definitions of “confounder” that people use (formally or informally) and tested each one against two basic properties:
- If you adjust for all variables that count as confounders under that definition, is confounding gone?
- Does each confounder help reduce or remove bias in at least some analysis?
They also used simple math examples to show when a definition succeeds or fails.
Two key ideas explained simply
- Confounding: Like trying to judge whether umbrella use “causes” wet clothes on a rainy day. Rain itself affects both umbrellas and wetness, so rain is a confounder. If you don’t account for rain, you might think umbrellas cause wet clothes.
- Minimal sufficient adjustment set: The smallest group of variables you must adjust for to fairly compare “exposed” and “unexposed” people. Think of it like the essential ingredients you must include for a recipe to turn out right—no extras, no missing essentials.
What they found (and why it matters)
The authors examined six candidate definitions for “confounder.” Here’s what they learned, in plain language:
- Defining a confounder as “anything associated with both the exposure and the outcome” can fail. Sometimes adjusting for such a variable can actually make things worse (this can happen with a special kind of variable called a “collider,” which can create a false link if you adjust for it).
- Defining a confounder as “anything that blocks a backdoor path” (based on the causal diagram) also isn’t enough by itself. Some variables block a path but don’t reliably help reduce or remove bias in realistic analysis situations.
- Defining a confounder as a variable that’s in every possible smallest-needed set is too strict. Sometimes there are multiple smallest-needed sets, and no single variable appears in all of them. That would make it look like there are “no confounders,” even though confounding still exists.
- The definition that worked best: A confounder is any pre-exposure variable that belongs to at least one minimal sufficient adjustment set. In other words, it’s one of the essential ingredients in at least one correct recipe for adjustment. This definition passed both tests:
- If you collect all variables that meet this definition, adjusting for them removes confounding.
- Each such variable can help eliminate bias in some analysis setup.
- Defining confounders as “anything that reduces bias” or “anything that changes your estimate when adjusted for” can be misleading. These ideas depend on the scale (for example, risk difference vs odds ratio) and can be fooled by mathematical quirks, so they don’t guarantee true deconfounding.
The authors also introduce a helpful label:
- Surrogate confounder: A variable that can reduce bias but can’t, by itself (or even with some common companion variables), fully remove confounding. It’s useful, but it’s not one of the truly essential ingredients.
Why this is important
- It gives researchers a clear, consistent way to decide which variables to adjust for. That’s crucial for making fair comparisons in observational studies (studies without random assignment).
- It warns against common mistakes—like adjusting for the wrong kind of variable (a collider)—which can accidentally create bias.
- It separates “must-have” variables (true confounders) from “nice-to-have” helpers (surrogate confounders).
What this means going forward (impact and implications)
- Better study design: Scientists can plan to measure variables that are part of at least one minimal sufficient adjustment set. That increases the chance their results reflect true cause-and-effect.
- Smarter analysis: Analysts can focus on the right adjustment sets, avoid harmful adjustments, and understand when some variables only partly help.
- Clearer communication: Using the proposed definition helps everyone use “confounder” consistently, making research results easier to trust and compare.
A simple takeaway
Think of finding confounders like packing for a trip. You need certain essentials to make the trip work. The paper says: a true confounder is one of those essentials in at least one complete, minimal packing list. Bring all the essentials from any minimal list, and you’ll be prepared (no confounding). Some extra items (surrogates) may help a bit, but they’re not the must-haves.
Collections
Sign up for free to add this paper to one or more collections.