Needles in the Haystack: Identifying Individuals Present in Pooled Genomic Data

Biology – Quantitative Biology – Genomics

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

32 pages including 7 figures. Versions: V3 contains major revision of content, including a new derivation in the Results, rewr

Scientific paper

10.1371/journal.pgen.1000668

Recent publications have described and applied a novel metric that quantifies the genetic distance of an individual with respect to two population samples, and have suggested that the metric makes it possible to infer the presence of an individual of known genotype in a sample for which only the marginal allele frequencies are known. However, the assumptions, limitations, and utility of this metric remained incompletely characterized. Here we present an exploration of the strengths and limitations of that method. In addition to analytical investigations of the underlying assumptions, we use both real and simulated genotypes to test empirically the method's accuracy. The results reveal that, when used as a means by which to identify individuals as members of a population sample, the specificity is low in several circumstances. We find that the misclassifications stem from violations of assumptions that are crucial to the technique yet hard to control in practice, and we explore the feasibility of several methods to improve the sensitivity. Additionally, we find that the specificity may still be lower than expected even in ideal circumstances. However, despite the metric's inadequacies for identifying the presence of an individual in a sample, our results suggest potential avenues for future research on tuning this method to problems of ancestry inference or disease prediction. By revealing both the strengths and limitations of the proposed method, we hope to elucidate situations in which this distance metric may be used in an appropriate manner. We also discuss the implications of our findings in forensics applications and in the protection of GWAS participant privacy.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Needles in the Haystack: Identifying Individuals Present in Pooled Genomic Data does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Needles in the Haystack: Identifying Individuals Present in Pooled Genomic Data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Needles in the Haystack: Identifying Individuals Present in Pooled Genomic Data will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-19111

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.