Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 33 tok/s

Gemini 2.5 Pro 51 tok/s Pro

GPT-5 Medium 24 tok/s Pro

GPT-5 High 26 tok/s Pro

GPT-4o 74 tok/s Pro

Kimi K2 188 tok/s Pro

GPT OSS 120B 362 tok/s Pro

Claude Sonnet 4.5 34 tok/s Pro

2000 character limit reached

Fast and accurate imputation of summary statistics enhances evidence of functional enrichment (1309.3258v1)

Published 12 Sep 2013 in q-bio.QM, q-bio.GN, q-bio.PE, and stat.AP

Abstract: Imputation using external reference panels is a widely used approach for increasing power in GWAS and meta-analysis. Existing HMM-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants (increasing to 87% (60%) when summary LD information is available from target samples) versus 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and is computationally very fast. As an empirical demonstration, we apply our method to 7 case-control phenotypes from the WTCCC data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of $\chi^2$ association statistics) compared to HMM-based imputation from individual-level genotypes at the 227 (176) published SNPs in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of 4 lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic vs. non-genic loci for these traits, as compared to an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses.

Citations (180)

View on Semantic Scholar

Summary

Overview of Gaussian Imputation of Summary Statistics

In the paper titled "Fast and accurate imputation of summary statistics enhances evidence of functional enrichment," the authors introduce a methodology for genome-wide association studies (GWAS) that leverages Gaussian imputation of summary statistics, emphasizing its utility in cases where individual-level genotype data is inaccessible due to privacy and logistic constraints. The method utilizes reference panels, such as those provided by the 1000 Genomes Project, to perform imputation directly on summary association statistics. This approach stands in contrast to traditional Hidden Markov Model (HMM)-based methods that require individual-level data.

Key Contributions

The paper presents several key advancements:

Imputation Methodology: The authors developed an imputation approach for summary statistics that approximates the distribution of association statistics using a multivariate Gaussian model. This approach accounts for linkage disequilibrium (LD) without the need for individual-level data, utilizing the covariance structure among SNPs derived from reference panels.
Accuracy and Efficiency: In simulations using data from 1000 Genomes, the Gaussian imputation method shows high accuracy relative to the HMM-based approaches. Specifically, it recovers a substantial portion of the effective sample size, attaining 84% for common variants and 54% for low-frequency variants. This compares favorably to the gold standard of HMM-based methods, which achieve 89% and 67% respectively.
Computational Speed: The method is highly computationally efficient, offering orders of magnitude reductions in runtime compared to HMM-based imputation, particularly when pre-computed weights from reference data are utilized.

Empirical Applications

The paper validates the imputation methodology using real-world GWAS data from the Wellcome Trust Case-Control Consortium (WTCCC) and height data from the British 1958 Birth Cohort. The correlation between imputed association statistics and those derived from HMM-based methods is remarkably high (r = 0.94), underscoring the method's reliability.

Functional Enrichment Analysis

An application of the imputation approach is explored in functional enrichment analysis. By enhancing the set of variants tested through imputation, the authors report increased statistical power for detecting enrichment of disease-associated variants across functional categories. Specifically, analyses of lipid traits demonstrate that imputed data increases both the magnitude and statistical significance of enrichment at genic loci compared to non-genic loci.

Implications and Future Directions

The introduction of Gaussian imputation from summary statistics is a significant advance in genetic epidemiology, improving functional enrichment analyses and offering a new tool for studies reliant on public summary data. While there might be a slight deflation in association statistics when summary LD is unavailable, ongoing improvements in reference panel sizes and the sharing of LD statistics with summary data could address this issue. Future work may extend this approach to accommodate low-coverage sequencing data, which is becoming increasingly prevalent in genome-wide studies.

Overall, the paper offers a robust, computationally efficient method for imputation that promises to enhance the power and scope of functional genomic investigations. It fosters a broader dissemination of genetic research findings by improving the utility of public summary statistics for imputation, paving the way for more comprehensive meta-analyses and cross-paper inferences.