Early last summer, Francis S. Collins, M.D., Ph.D. ’74, FW ’84, joined President Clinton and the CEO of Celera Genomics, J. Craig Venter, M.D., at the White House on a day that marked a turning point in the history of medicine. Scientists in a half-dozen countries, working together to sequence the 3.1 billion nucleotide base pairs that make up the human genome, had completed their first pass of the data several years ahead of schedule and would quickly be filling in the gaps to finish decoding the human book of life, as the seemingly endless string of adenines, guanines, cytosines and thymines has been described.
It was a moment in which millions of non-scientists became interested in genetics, a sudden curiosity reflected in front-page coverage around the world and the roller coaster for biotechnology stocks that preceded and followed the announcement. If people were excited by the news, they were also confused—not only by the complexities of molecular genetics but by what exactly was being announced. If the genome was 97 percent mapped, 85 percent sequenced and only 24 percent verified, what was so special about June 26, the day of the announcement?
The public proclamation, like the genome work itself, may have been speeded up by the race between the government-funded project and private upstart Celera to finish what is a gargantuan task, made possible by automated sequencing around the clock and massive computing power. (Venter brashly announced in 1998 that he would beat the public consortium’s timetable and finish the job by 2000, five years ahead of its original target.) But although the working draft was not quite complete, the news of June 26 was indeed extraordinary. Not only had scientists determined the exact sequence of the vast majority of the chemical building blocks that make up human DNA, but they had also strung together this information across the entire human genome; despite the gaps, these overlapping sequences stretched from end-to-end of every chromosome.
“We have caught the first glimpses of our instruction book, previously known only to God,” said Collins, who received his Ph.D. in physical chemistry from Yale in 1974 and trained as a fellow in genetics and pediatrics in the laboratories of Sherman M. Weissman, M.D., and Bernard G. Forget, M.D., in the early 1980s.
Among those called on to interpret the news was Richard P. Lifton, M.D., Ph.D., chair of the Department of Genetics at the School of Medicine and a Howard Hughes Medical Institute investigator. (Lifton is also a member of the National Advisory Council to the National Human Genome Research Institute and of the NIH Oversight Committee for the Human Genome Sequencing Project.) “It’s an awesome accomplishment,” he told Jim Lehrer on the PBS NewsHour program the day of the announcement, “one that will have a profound impact on human biology and medicine for the next century. Who we are, why we are the way we are, why we succumb to different diseases—these are no longer open-ended questions but are bounded ones.”
So what comes next? In late summer, Lifton sat down with Yale Medicine Editor Michael Fitzsousa to discuss the impact of the genome project, the opportunities it provides investigators seeking the causes of rare and common diseases, and the likely next steps in Yale laboratories and around the world. Lifton, who came to Yale from Harvard in 1993 and heads the newly created Center for Genetics in Medicine at the School of Medicine, was the first to define the genetic underpinnings of hypertension, which affects 50 million people in the United States alone. With his colleagues he has identified 12 of the 13 genes known to play a role in regulating blood pressure, mostly through studies of families with rare disorders. In July, he and research fellow David S. Geller, M.D., Ph.D., reported in Science that they had discovered a mutation responsible for an inherited form of hypertension during pregnancy, a complication that affects some 8 million women and their infants each year (See Findings).
Human Genome Project Director Francis Collins and his private-sector counterpart, J. Craig Venter, announced in June that the sequence of the entire human genome had been deciphered, at least in working-draft form. What significance does this have for medicine?
This really is a monumental achievement. The significance of it is that we can begin to see the outlines of a new future for medicine. We recognize that virtually every human disease—from cancer to heart disease, to asthma, to neuropsychiatric and other disorders—has significant inherited contributions. However, the road to identifying those components has been a narrow and twisting one. We haven’t known how many genes there are in the genome, what each gene itself is, where they are on chromosomes.
Having the human genome sequence really changes the way one thinks. We are no longer walking blindfolded through the forest not knowing how many trees there are, where they are, or when we’re going to stumble. We now have a precise map of where we’re going.
What exactly do the sequence data tell us?
There are a finite number of genes—probably 35,000 to 45,000, maybe as many as 100,000. So the inherited contribution to disease has to reside in the DNA sequences of those genes or the nearby components that regulate the expression of those genes. And so we go from this very open-ended problem to a bounded one, where we know all the genes and, in short order, will know all the common variations in the genes. It really becomes a matter of determining which variants in which genes contribute to the development of different human diseases. In many ways, it’s analogous to where chemistry was before and after the development of the periodic table of elements. Imagine if you were the chemist trying to figure out the composition of a compound before you knew what all of the elements were. Now that we have the human genome sequence, it’s a matter of figuring out which genes are involved in which particular diseases.
What’s the next step for the gene mappers?
The draft version of the human genome sequence permits us to begin to identify, from the 3 billion base pairs of the human genome, all of the genes encoded in that genome. We can estimate that perhaps half of all human genes are undiscovered and will be identified by combining this raw sequence with other databases.
That will be one important step. In parallel, we will begin identifying all of the common variations that occur in these genes in human populations. Another process will be to go from the draft version of the human genome, which is 97 percent complete, to the full version, which we anticipate will come by the year 2003. Ambiguities as to the order of particular sequences within the chromosomes will then be resolved. We’ll have the whole sequence.
What we have now has been compared to a book with all the pages in order but the letters on each page scrambled. Is that unscrambling what will take place over the next few years?
Yes. In some cases we have pages that are complete. In other cases there are words and letters that need to be arranged properly. However, the information that we have today is a tremendous advance for the investigation of the inherited causes of disease. Having the genome sequence provides a tremendous boost to genetic research all over the world.
What’s an example of that? Say I’m a basic scientist, how are my prospects as an investigator different than they were perhaps a year ago?
I think there are at least three areas that will be strongly influenced by this. The first of these is genetic investigation. For the last decade, investigators have been mapping the chromosomal locations of disease genes by comparing the inheritance of chromosome segments to the inheritance of diseases in families. And many of these projects have located genes on chromosomes but have not yet been able to identify the gene in which mutations cause the disease. If you are able to refine the location of the disease gene only to a big chromosome segment that may contain 10 million base pairs, it’s an extremely arduous task to identify all of the genes in that interval and then test which of them have mutations that cause the disease of interest. Having the genome sequence provides a tremendous bypass to that part of the project. Now you know all of the genes that lie within this 10-million-base-pair interval. Rather than putting an army of postdocs on the project to go through the heavy lifting of identifying all of the genes of that interval, you have somebody sit down at a computer terminal and parse through the sequence to identify all of the genes in that DNA sequence. All over the world, this is providing tremendous acceleration for human genetic studies. Projects that have lain dormant for a period of years are suddenly going to be brought to completion.
A second area will be the identification of new targets for therapeutic use. For example, many drugs now in clinical use target G-protein-coupled receptors, which sit at the cell surface and are activated by proteins or small molecules; nuclear hormone receptors, which sit inside the cell and regulate transcription of genes; or ion channels and transporters that mediate passage of electrolytes in and out of tissues. Well, we’ve known about a number of these receptors, but it has been recognized that there are many more in the human genome that are ripe for discovery. Because these different types of targets share common elements, it will be relatively simple to identify all of the members of these gene families and to think about which of these might be targets for novel therapies. This is a first step, but it’s important.
A third area in which the genome data will be enormously helpful is in identifying biochemical pathways that are altered in human disease states. We will have the ability to monitor the expression of every gene in a cell and to ask how that pattern of gene expression is altered in response to disease—or in response to a particular intervention. Up to now, most scientists have been able to deal with only one or a few genes at a time, having to make good guesses as to which pathways might be involved in disease processes. Now we can ask that question on a much larger and more comprehensive scale.
The genome project has received enormous attention, it has affected financial markets, and it seems to be affecting the way the public sees disease and health. Are great breakthroughs in medicine just around the corner?
In medicine we’ve done our best therapeutically when we have understood in great detail the underlying pathogenesis of disease. So I am optimistic that this greater understanding of human disease will ultimately translate into improved therapies. The timing and the pathway to achieving new treatments, however, are much harder to predict. In some cases we may readily identify new targets that are amenable to development of small-molecule agonists or antagonists. In other cases we may find new proteins that can very quickly lead to the development of new therapies. An example of that would be some of the growth factors for the hematopoietic lineage that are already in clinical use. That said, it will not always be the case that understanding the biology of a disease can be translated quickly into a treatment. A good example would be sickle cell anemia, where we’ve understood the molecular basis of the disease since 1953 but have yet to have a cure for the large majority of affected patients. Similarly, the bacterium causing tuberculosis was identified over a century ago, but it took 50 years to develop a cure for this disease. One has to recognize that the road from understanding the causation of disease to having effective treatment will be quite varied. In some cases there may be rapid successes, but in others it may be a very long process and we should be prepared for that and not falsely raise the expectations of the public.
In pursuing the goal of translating basic science knowledge into clinical interventions, what strategies seem to have the most potential?
The obvious key to this enterprise is increased collaboration between basic scientists and clinicians. The opportunities here are really unprecedented. When I started as a graduate student in 1975, it was very hard to think about productive projects that one could do at the interface between molecular genetics and human disease. Today, this has completely changed. There is tremendous opportunity in almost every disease area. If, 25 years ago, you were interested in diabetes, productive avenues might have included trying to identify genes that are expressed in the pancreas or in fat cells, with the hope that these might be involved in some way in the pathogenesis of diabetes—a relatively indirect approach. Now we can take the clinical problems that we’re interested in, study the disease directly with genetic approaches complemented by a monitoring of gene expression, and expect that we’re actually going to learn something fundamental about the disease pathogenesis. This is qualitatively different than what we could do a generation ago. What is needed to make that work is expertise on both the clinical side and the basic science side and bridges between them.
There certainly has never been a time in the history of medicine in which there has been a more rapid unraveling of the pathogenesis of human disease. And this is just an extraordinarily exciting time to be interested in human disease biology.