In February 2001, a draft version of the human genomic sequence, over three billion DNA base pairs in length, was completed. Knowing the sequence of the human genome encouraged the development of biotechnology that has advanced personalized medicine, biomedical research, and catalyzed provocative discoveries in human evolution and migration.
Yet, perhaps the most fascinating discovery was the confirmation that only a surprisingly small percentage (<2%) of the entire genomic sequence codes for protein. Proteins are the workhorses of the cell and body. And while scientists had previously identified functional roles for other small sections of the human genome, the purpose of a vast majority of the rest was unknown. This prompted scientists to ponder how much of the rest of our genomic sequence is biologically important.
The NIH launched the Encyclopedia of DNA Elements (ENCODE) Project in 2003 to answer this question and others concerning the architecture of the human genome. This consortium of laboratories was tasked with cataloging all of the important and functional elements within the human genome. We now know that that well over 80% of our DNA sequence has some form of functionality beyond encoding genes that spur the creation of protein. This was much higher than what most researchers anticipated.
It appears that certain genomic regions are vital for normal human growth and developmental timing, from embryo to sexual maturity. Other regions are important for general cellular maintenance and genomic upkeep. The specific sequences that govern these functions are not genes in the traditional sense, but rather regulatory sites. They can regulate how and when our actual genes are expressed by serving as locations for proteins to bind to initiate or repress gene expression. Disruption or mutation of these critical areas can impair the cell and can cause diseases like cancer.
ENCODE also identified thousands of new genes in the genome that code only for RNA—a molecular cousin of DNA that has its own functions and purposes within the cell and body. For example, some small RNA molecules modulate the quantity of proteins in the cell, while others assist in the creation of said protein or act as message couriers between cells. Some RNAs serve as scaffolding support for multi-protein complexes or contribute to numerous other essential roles for the cell to grow and function properly. There is a lot of current research identifying how each of these RNAs function and this has given researchers and clinicians deeper insight into the workings of the cell.
The ENCODE Project has shown that the architecture of the human genome, and the genomes of all other living or viral organisms, is both elegantly built and often extremely complex. Billions of years of evolutionary selective pressures, including random mutational events, fusions, duplications, deletions, and genetic transfers between species have shaped the orientation and architecture of all the genomes we are trying to decipher.
Now that we are beginning to understand what can occur throughout the genome, scientists have begun to ask deeper questions. Did evolutionary pressures get genomic structure right the first time around? What if we were to tinker with that order?
Synthetic biologists want to know the answers to those questions and are reshuffling evolution to drive at the answers. Synthetic biology is a subspecialty of biology and bioengineering focused on designing and creating artificial genes, chromosomes, and genomes. They chemically synthesize DNA in the laboratory in the precise sequence they desire to test what happens if you make changes to the naturally inspired DNA sequence.
This was first accomplished in 1979 when H. Gobind Khorana’s laboratory at MIT chemically synthesized the first synthetic gene from the E. coli genome. The technology was advanced for its day, but slow and time-consuming. The sequence of the synthetic gene was a mere one hundred base pairs and change.
In 1990, a decade later, Abbott Laboratories in Chicago built the first synthetic bacterial DNA plasmid, just over two thousand bases in length. A plasmid is a circular piece of DNA that can transfer itself between bacterial (and sometimes between other) cells. Plasmids usually carry multiple genes that can benefit the recipient cell, including genes related to antibiotic resistance. Bacterial transfer of plasmid DNA is a major mechanism of their evolution and diversity.
Abbott Lab’s synthetic plasmid was engineered to be smaller than what is found in nature, with altered functional elements within its sequence to make the plasmid more practical for use in the laboratory and in cloning techniques commonly used in the 1990s. Synthetic plasmids engineered with specific genes and regulatory elements are widely used in laboratories today and help promote the development of precision cancer research, next-generation antibiotic discovery, and early gene therapies.
Engineering and building a gene or a small DNA plasmid from scratch is one thing, but building the complete synthetic genome of an entire organism was a far more audacious goal. Dr. Eckard Wimmer at Stony Brook University in New York set his laboratory to this task and successfully constructed the complete 7,500 base pair-long genome of the polio virus. The polio genome was synthesized so closely to the natural sequence of polio, that when mixed with the necessary cocktail of proteins inside a test tube, viral particles were produced. These viral particles were able to infect mice and were transmissible, albeit with less efficacy than the natural virus.
Dr. Wimmer’s group also engineered in additional changes to the sequence of the polio genome. One type of change created a DNA tag, akin to a molecular barcode, that is unique to Wimmer’s polio genome and made it easy to identify from the natural polio genome. This precautionary step is common practice in synthetic biology to discern the synthetic from the natural based on sequence alone. Despite the group’s good intentions, their work raised serious ethical and safety concerns. The most pressing of these lay in the realization that if the natural genomic sequence of an organism or virus is known, it was now possible to build it from scratch and make new edits.
A controversial scientific project spilled over into public discourse when Dr. Yoshihiro Kawaoka at the University of Wisconsin and Dr. Ron Fouchier announced their work on manipulating the H5N1 avian flu viral genome. In doing so, they had made H5N1 more infectious and more readily able to avoid detection by cells of the immune system.
A debate between the scientific community and general public occurred concerning whether the specific viral sequences and alternations of this work should be openly available to the public. This meant the sequence of a more virulent strain of the flu could be downloaded and read by anyone. At the time, DNA synthesis was no longer a technically complicated process to perform. The fear was that someone else could use the newly published sequence and make H5N1 even more aggressive and infectious, particularly within humans. This same concern applied to work researchers were doing with anthrax, Ebola, the Spanish flu, and other deadly pathogens.
It’s easy to envision a scenario where the sequence of a pathogenic organism is built and manipulated for nefarious reasons. The NIH thought so as well. In 2014, a moratorium on all gain-of-function pathogenic manipulation was instilled until all safety, publication, and communication issues could be sorted out. This was lifted for work on the flu virus early in 2019. The NIH announced new funding for Dr. Kawaoka’s work following adjustments in his laboratory’s safety protocols and the establishment of additional institutional oversight.
Like the plots of many science fiction stories and novels, it’s not a far reach to imagine mad scientists creating designer viruses to infect and kill humans. But there are important reasons for this work in virology, as well as in other fields. Redesigning the genomes of human pathogens may help us answer very important questions in biology. Dr. Kawaoka’s work may decipher novel ways to prevent pandemics, develop new therapies and vaccines, and alleviate suffering in those with the flu or other similar illnesses. This work may also shed light on the biological mechanisms of other human diseases as synthetic biologists set their sights on larger genomes.
The J. Craig Venter Institute (JCVI) is a nonprofit research institute conducting genomic research and piloting synthetic biology initiatives. JCVI built the first complete synthetic genome of a living organism, the sexually-transmitted bacterium Mycoplasma genitalium, in 2008. JCVI subsequently proved that the complete synthetic genome of a related bacteria, Mycoplasma mycoides, could dutifully substitute a bacterium’s natural genome and direct normal cellular growth and division. Synthetic organisms were now a reality.
The idea for this work is multifaceted. Using a synthetic genome allows researchers to pluck away at different genes and regions of the genome to further understand basic cellular function and genome organization. These experiments can offer clues into the nature of antibiotic resistance, bacterial community dynamics, or how bacteria evolved into eukaryotes. This may also help solve some perplexing challenges left in understanding how complex organisms like plants and animals evolved from single cells.
In 2016, JCVI announced the creation of a Mycoplasma bacterium using a synthetic version of its genome but with nearly half of its natural genes removed. This bacterium was named Mycoplasma laboratorium, or Synthia for short. Synthia is the world’s first completely synthetically engineered organism with a modified and minimalized genome. It cost approximately $40 million to construct and is the first direct proof that engineered genomes can produce living organisms capable of replication and division.
This work offers shocking insight into the assumed necessity of most genes in an organism’s genome. The completed Synthia can now be tested to see if changes in the order of these genes matter. This may answer if the blueprints of our evolutionary past are more malleable than was once thought.
As soon as genome synthesis became practical in the laboratory, other initiatives began in the late 2000s to synthesize the genomes of even larger organisms, including yeast. There are already several yeast chromosomes synthesized in their entirety, each tens of millions of DNA bases in length. Current efforts are focusing on individually introducing each synthetic chromosome back into yeast until there are yeast cells with a completely artificial genome. This monumental accomplishment will allow researchers to gain substantial insight into genomic complexity. Yeast has long served as a model for human genetics and the synthetic version will offer additional insights into human pathologies. It is likely that the first synthetic yeast will be introduced to the world within a handful of years.
But where would science be if boundaries weren’t being pushed to their maximum? The announcement of the Human Genome Project Write (HGP-Write) in 2016 was the first step toward that boundary. HGP-Write is an international effort led by geneticists Drs. George Church and Jef Boeke, along with other synthetic biologists. Primary goals of HGP-Write are to reduce the costs of synthetic biology and develop new technologies to make DNA synthesis more efficient, more reproducible, and more reliable. The ultimate goal is to construct the first artificial human genome and cell.
The HPG-Write consortium believes that artificially engineering the human genome will allow for multiple breakthroughs in biomedical research. These breakthroughs include understanding immune cell resistance to viruses, growing and transplanting artificial organs for transplantation, and having a new testing ground to understand all of the functional elements of our genomic sequence. These efforts are meant to synergize with those in ENCODE to truly interpret the power of the human genome.
HPG-Write participants are aware of the complex ethical considerations related to artificially creating a human genome and transferring that into a human cell. Fundamental ethical and moral questions related to who owns this genome, how does that differ with a natural human genomic sequence, how far into development would cells be allowed to go, and what if this technology leads to the complete and synthetic design of a living human being, need to be answered.
Synthetic biologists are not ignoring these important questions. A major priority of the HGP-Write initiative is public discourse on these endeavors and how best to safeguard them. The use of watermarks and barcodes, scientific regulatory and public oversight, and even engineering fail-safes to limit the scope of these projects is on the table and playing a major role in the development of projects related to this field.
Humanity is already dealing with the fallout of last year’s announcement of twin girls born in China with CRISPR-edited genomes. Worldwide calls for moratoriums and enhanced regulation of CRISPR-based genome editing are ongoing, but not yet completely effective. Similar moratoriums have been discussed on designing synthetic genomes. Several molecular fail-safes are currently being implemented for CRISPR-modified organisms and will ideally be engineered within synthetic genomes. However, the outcomes of these measures have yet to be completely assessed.
Since the HGP-Write project is still a long way from the completion of a synthetic human genome, there is still time to explore and discuss the benefits and risks of its implementation. Otherwise, we run the risk of developing a technology that outpaces regulation and which could have life-altering impact, just like in some cases already seen using CRISPR.
Science fiction has long focused on technological advances like these and the impact of new biological, chemical, and physical discoveries in society. As Dr. Ian Malcolm famously said about engineered fail-safes in the dinosaurs of Jurassic Park, “Life finds a way.” It will be best to figure out soon how synthetic life can be safely regulated. This will help us find a way to use this new tool to discover the last secrets of the cell responsibly and with the best possible outcomes.
Douglas Dluzen, PhD, is a senior science writer and editor at the NIMHD. He is a geneticist and has previously studied the genetic contributors to aging, cancer, hypertension, and other age-related diseases. He loves to write science and science fiction while sitting on the couch with his wife Julia (who has immeasurably helped him fact-check and edit his work), son Parker, and daughter Cedar.