How are they doing this?

What is enabling science to open this Pandora's box is the computer. The amount of raw data that is being produced by mapping and sequencing three billion pairs of human DNA will be astronomical. The only way this information could possibly be categorized, stored and used is through the use of computers. Not only have computers made this project technically feasible, they have also procured funding for the project. The U.S. Department of Energy, a major financial contributor, agreed to fund part of the effort since it would be a way to encourage the growth and development of super computers (Davis 21).

The human genome project has enabled many new computer innovations. The DOE's investment to support the development of computer technology has not been wasted. The genome project is comprised almost solely of biologists and not computer engineers. As a result, most of the new technology is not necessarily new but rather up-dated, customized and pushed a little farther. There are three major ways in which computers play a part in the genome project. The first is the use of a parallel processing computer. The other two consist of a super-chip and a pattern-recognition program that was borrowed from the Pentagon (Davis 215).

The main computing power for the U.S. program is the Connection Machine (CM) that is built by Thinking Machines, a company out of Massachusetts (Davis 201). The CM is a parallel processing computer and works differently than most computers. Rather than keeping memory and the processing unit on separate chips, forcing the computer to waste time by getting the memory and then operating on it, the CM does this in all one step. There is no wasted time in retrieving stored material. The benefit of this machine is realized when one considers how long it would take to get a chemical strand that is seven million segments long into the CPU. The parallel processing machine is ideally suited for such a task.

The connection machine uses hundreds of these combined processors, all working at once, to scan entire fields of information at once. This is an important feature of the technology since regular computers look at an image one dot at a time. By looking at an entire field of information at once, the M is 1,000 times faster at comparing data than the most powerful conventional mainframe computers (Davis 201).

Another important aid to the project is the super-chip invented by Applied Biosystems, Inc. When analyzing a genome, long stretches of DNA have to be deciphered, existing patterns have to be determined and then that pattern has to be translated into individual genes. The chip is capable of scanning up to ten million DNA characters per second. The chip was tested against a Vax mini-computer and a Cray-2 supercomputer. The three systems were to analyze a 10,000 base gene and compare it with the 30 million base pairs in Genbank, the primary library for all of the project's results. The chip took ten minutes, the Vax took ten days and the supercomputer took two days.

The chip was originally developed for the Department of Defense inorder to scan through the large numbers of cables and reports that Pentagon receives each day. It recognizes specific patterns of information within these documents with extreme precision.

The Applied Biosystems' chip works with a new type of pattern recognition software. This software is not hindered by complex requests, unlike most other types of software. Generally complex questions mean that the search will take longer since the computer has to meet all of the criteria set forth by the question. The new software does not follow the question in steps, rather it looks for patterns that are inherent in the question and finds its duplicate in the database. The more complex the pattern of the question the easier it is for the software to find the correct DNA sequence (Davis 218).

Christopher Myles