Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 86 tok/s

Gemini 2.5 Pro 53 tok/s Pro

GPT-5 Medium 19 tok/s Pro

GPT-5 High 25 tok/s Pro

GPT-4o 84 tok/s Pro

Kimi K2 129 tok/s Pro

GPT OSS 120B 430 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

Computation of ancestry scores with mixed families and unrelated individuals (1606.08416v1)

Published 27 Jun 2016 in stat.ME

Abstract: The issue of robustness to family relationships in computing genotype ancestry scores such as eigenvector projections has received increased attention in genetic association, as the scores are widely used to control spurious association. We use a motivational example from the North American Cystic Fibrosis (CF) Consortium genetic association study with 3444 individuals and 898 family members to illustrate the challenge of computing ancestry scores when sets of both unrelated individuals and closely-related family members are included. We propose novel methods to obtain ancestry scores and demonstrate that the proposed methods outperform existing methods. The current standard is to compute loadings (left singular vectors) using unrelated individuals and to compute projected scores for remaining family members. However, projected ancestry scores from this approach suffer from shrinkage toward zero. We consider in turn alternate strategies: (i) within-family data orthogonalization, (ii) matrix substitution based on decomposition of a target family-orthogonalized covariance matrix, (iii) covariance-preserving whitening, retaining covariances between unrelated pairs while orthogonalizing family members, and (iv) using family-averaged data to obtain loadings. Except for within-family orthogonalization, our proposed approaches offer similar performance and are superior to the standard approaches. We illustrate the performance via simulation and analysis of the CF dataset.