TOPMed: Analyzing human genomes to understand mutations, human evolution
What kind of questions would you ask if you had access to the full DNA makeup of tens of thousands of people? Would you look for disease-causing mutations? Would you be curious about how modern humans are related?
Those are some of the issues that researchers are studying as part of a project called TOPMed, the Trans-Omics for Precision Medicine Program. Supported by the National Heart, Lung, and Blood Institute, TOPMed involves about 1,000 researchers—including several from the University of Michigan School of Public Health—and 30 working groups conducting 80 studies around the world.
In a landmark study published in Nature, researchers explain how they’re using existing data, sophisticated algorithms and collaboration to look into human evolution at the genetic level after sequencing the whole genome of 53,831 people of diverse genetic backgrounds.
Researchers hope their work will eventually lead to the improvement of disease diagnoses, treatment and prevention.
Co-authors affiliated with U-M include Daniel Taliun, Sarah Gagliano Taliun, Jonathon LeFaive, Hyun Min Kang, Sayantan Das, Thomas Blackwell, Albert Smith, Keng-Han Lin, Jacob Pleiness, Xutong Zhao, Sebastian Zöllner and Goncalo Abecasis, all of the Department of Biostatistics and Center for Statistical Genetics at U-M’s School of Public Health, Sharon Kardia, Jennifer Smith, Lawrence Bielak and Patricia Peyser, of Department of Epidemiology at U-M’s School of Public Health, and Cristen Willer of the departments of Internal Medicine and of Human Genetics at U-M.
They discuss the study and ongoing research.
Why is this work important?
Sebastian Zöllner: DNA sequencing has been improving both in quality and in costs rapidly over the last 10 years, but until recently our use of sequencing to understand the contribution of rare genetic variation to disease risk has been limited by sample size. The TOPMed project was aimed at augmenting sample sizes by sequencing 150,000 people. Our current sample size is the largest existing single dataset available to researchers around the world to take a deep dive into the human genome, and compare and contrast changes and commonalities at a large scale. We were able to identify 400 million variants, 78% of which had not been described before.
How will the TOPMed project change genetics research?
Albert Smith: TOPMed is by far the largest sequence dataset available for use as a reference for genotype imputation. Over the course of the TOPMed project, we have secured generous data contributions from underlying cohorts for inclusion in the imputation server. An imputation server featuring the TOPMed panel was released to the scientific community last April, and has rapidly been uptaken with nearly 14 million genomes imputed by more than 1,200 researchers worldwide to date.
Sarah Gagliano Taliun: The imputation server provides several benefits for researchers. One benefit is computational convenience. The server allows researchers to perform the computationally intensive imputation procedure in a secure manner without requiring a dedicated computer system. Secondly, because the TOPMed imputation panel provides a large number of genetic variants across diverse populations, imputation of genotypes using this panel can be more accurate.
How is the TOPMed project unique?
Sebastian Zöllner: One unique aspect of this research is the diversity of the individuals from whom we have DNA sequences. In the past, genetics has focussed on data that is easy to generate. Thus, studies of individuals of European descent are overrepresented and studies of other ethnicities are underrepresented.
TOPMed has focused on studying diverse populations, resulting in more nuanced data that enables different questions. In this paper, for example, we recapitulate that African American and Caribbean populations have the greatest genetic variability which is consistent with gradual loss of variability as we track human’s African origin and their migration from Africa. We also identify variants that are under selection in different populations. As these variants seem to mediate immune responses, we can get a glimpse at the pathogenic challenges early populations in different parts of the world faced.
Daniel Taliun: TOPMed is a great example of how valuable collaborations are in research. TOPMed enabled the collection of the largest and most diverse dataset of human genomes to date, and it brought together top experts from interdisciplinary fields to solve challenging problems related to human health and disease. It was an extraordinary experience for me to be a part of this effort, and I am thankful to all the TOPMed collaborators and paper authors for making this work possible.
Tell me about your research.
Sebastian Zöllner: My work focuses on point mutations that generate variants we use for genetic studies. A point mutation is a single nucleotide change in the genetic code. These changes are the result of biochemical or biophysical processes in the genome.
About 60 such point mutations happen in every individual per generation. So, over the history of our sample, more than 400 million such point mutations occurred, creating the genetic variation that now explains most of the heritable differences between people. Until recently, we thought of each of these point mutations as independent events, but recent studies in families have questioned this idea.
We analyzed the spacing between very rare mutations in individuals of African descent, European descent and Asian descent to see if we see stronger evidence against this hypothesis of independent events. We were able to confirm that some mutations happen in a clustered fashion in a very similar pattern across all individuals, regardless of ancestry. This is evidence that some mutation events create multiple changes and that point mutations are not necessarily independent.
Because our population data provides so many observations, we can estimate underlying parameters and identify regions in the genome where these clustered mutations are more likely to happen. These estimates improve our understanding of the biochemical processes that drive these related mutations. Better understanding the mutation process gives us an understanding of how the variants that explain heritable differences arise.
What comes next.
The TOPMed project provides an important contribution for identifying and understanding the impact of genetic variation on heritable diseases. The dataset now consists of more than 150,000 individuals from diverse backgrounds. In addition to sequencing data, TOPMed now encompasses other genomics data such as RNA-seq, methylation and metabolomics. You can explore the sequencing data on the BRAVO variant browser.