Update (April 1): The preprint from Karen Miga and her colleagues was published yesterday in Science, along with several other papers on research pertaining to the Telomere-to-Telomere (T2T) Consortium’s efforts to create a complete human reference genome.
The Human Genome Project was a tour de force that resulted in the first draft human genome sequence in 2000, but it wasn’t actually complete. The work left sequence gaps that genomicist Karen Miga of the University of California, Santa Cruz, calls the “final unknown” in remarks to STAT. In total, about 8 percent of the more than 3-billion-base-pair human genome—mostly repeats that are computationally challenging to assemble—has remained unsequenced in the two decades since that first draft.
Filling in those gaps has “never been done before,” Miga tells STAT, “and the reason it hasn’t been done before is because it’s hard.” But with an international group of collaborators, Miga last month (May 27) posted a preprint that starts to do just that, adding nearly 200 million DNA bases to the known human genome sequence and discovering some 115 potentially protein-coding genes in the process.
“It’s exciting to have some resolution to the problem areas,” Kim Pruitt, a bioinformatician at the US National Center for Biotechnology Information in Bethesda, Maryland, who was not involved in the research, tells Nature.
Miga and her colleagues used long-read sequencing technologies from Pacific Biosciences and Oxford Nanopore to interrogate the DNA extracted from a cell line derived from a uterine growth called a hydatidiform mole. This structure forms through the fertilization of an egg with no nucleus, meaning that the mole carries only DNA from the sperm, and none from the person whose uterus it was growing in—a genetic anomaly that made it easier to decipher more of the genome because it didn’t involve sorting out the genetic contributions of two parents.
Researchers years ago had generated cell lines from this hydatidiform mole, and therefore it’s possible that mutations arose in the genome before it was sequenced for this latest project, such that the new genetic information “may be largely the detritus that accumulates as a cell line is propagated over many years in culture,” Elaine Mardis, the co–executive director of the Institute for Genomic Medicine at Nationwide Children’s Hospital who did not participate in the work, tells STAT.
Because the cells were frozen for years and not serially passaged that whole time, Miga tells STAT, she thinks the new sequences are biologically relevant. However, she notes to Nature that there are a few regions that need further confirmation. Because the sperm that fertilized the egg to form the mole carried an X chromosome, the team has not dug into the genomic holes that exist in the human Y chromosome sequence—something the researchers are working on now.