Discover Magazine June 1997

Discover 18, No. 6, 76-83 (June 1997)

The Star Machine

BY GARY TAUBES

When a team of astrophysicists wanted to simulate
a moderate-size star cluster, they figured the world's
fastest computer could help. And so it could, they learned,
given 3,000 years for calculating. So these computer neophytes
did the natural thing. They built an even faster computer.

IN 1983, DAIICHIRO SUGIMOTO
expressed an opinion about globular clusters that did not endear him to researchers who study these bizarre astronomical objects. The clusters are spherical collections of stars--hundreds of thousands to millions of them--and are found predominantly in the outskirts of galaxies. Sugimoto, an astrophysicist at the University of Tokyo, suggested that the inner core of these clusters must undulate in and out like some extragalactic version of a beating heart. Otherwise they would keep expanding and eventually dissipate, which seems not to have happened, since there are still plenty around to be studied. Because Sugimoto was known for his work in the evolution of single stars rather than million-star clusters, the experts discounted his proposition as naive speculation. "I was a stranger to them,"" says Sugimoto. He believed he could prove his point if he could simulate a globular cluster on a computer, "but to do that," he says, "I needed big powers of computation, and at that time the power of supercomputers was not enough."

"Not enough" is an understatement. Simulating the evolution of a globular cluster requires a computer to calculate continuously the gravitational pull exerted by every star in the cluster on every other star, a calculation known as an n-body problem. Simulating a modest globular cluster--a 100,000-body problem--would require roughly 10 billion billion simple arithmetic calculations. Considering that there are roughly 30 million seconds in a year, and the fastest supercomputers at the time could do 100 million operations in one second, a single simulation would have taken some 3,000 years to complete. This was, as one of Sugimoto's collaborators later put it, too long to wait.

So it is that the fastest computer in the world today, or at least one of the two fastest, was not made by IBM or Intel or Cray or any of the other great supercomputing companies, or by brilliant 20-year-old computer nerds at MIT or Caltech, or by government computer experts at the ultrasecret National Security Agency (at least, we're pretty sure it wasn't). It was made by Sugimoto and his students and colleagues because they didn't have a computer fast enough to do what they wanted. The computer, called the Gravity Pipeline, or GRAPE for short, was the first computer to perform a trillion operations in a single second, and it is the runaway favorite to be the first to do a quadrillion operations a second, a speed known as a petaflop in the lingo of computing. It will probably reach that speed within the next five years.

GRAPE and its creators, all astrophysicists, are in a unique position. In the past year a handful of federal agencies--NASA, the Department of Energy, the National Science Foundation, the aforementioned NSA, and the Defense Advanced Research Projects Agency--have agreed that they are confronted with problems of national significance requiring petaflop or faster computing. These problems include everything from enormous scientific simulations to designing new weapons, modeling the explosions of nuclear warheads, and cracking codes. As a result, these agencies have initiated a program to create supercomputers that can operate at a petaflop, and have identified several extraordinarily ambitious designs that might someday work, and one that assuredly will. That one is the Gravity Pipeline. GRAPE and computers like it therefore represent at least part of the future of supercomputing; the crucial remaining question is how much.

THE STUDY OF GLOBULAR
clusters is pursued not for fame, money, or visions of Nobel prizes but purely for a scientific curiosity. At 10 to 12 billion years old, these massive accumulations of stars are among the oldest objects in the universe, and among the most enigmatic as well. Although they remain fairly stable, for instance, their core is as dense with stars as any spot in the galaxy, and almost as violent. Stars collide, fuse, ricochet past one another, and join forces to form X-ray binaries, millisecond pulsars, and other bizarre objects. The dynamics of these clusters are strange and mysterious: the dense core flings out stars like a Roman candle throwing off sparks, and these discarded stars will often leave the cluster, heading outward for parts unknown. The cluster will then collapse inward, making the interior even more violent rather than less. Yet the clusters apparently do not contain black holes--probably because the formation of thousands of binary and even triple star systems somehow stabilizes the collapse. Nor does the heat and radiation emitted by the myriad stars cause them to drift apart and slowly dissipate into the universe.

So astrophysicists study them because they are curious phenomena and also because "they are good laboratories," says Piet Hut, who is an astrophysicist at the Institute for Advanced Study in Princeton and a founding member of the GRAPE collaboration. In the neighborhood of our sun, stars are so far apart they rarely collide. At the center of the galaxy, traffic gets much heavier. The nucleus of our galaxy is as violent and as thick with stars as the worst clusters, although it is also dense with gas and dust clouds swirling and churning, and at the very heart there is most likely a massive black hole, which is not the case with clusters. To understand an environment as complicated as the inner galaxy, astrophysicists have to start with simpler systems. "A cluster is a pure isolated system with just 100,000 bodies," says Hut, "in a separate little clump orbiting the galaxy. If we cannot understand that, then there is no hope to try to understand the nucleus of the galaxy." Hut describes working on globular clusters as similar to "playing scales on the piano, learning the pure systems before you bring in the extra complications."

The complications in globular clusters, however, are bad enough. The trouble is those n bodies. The n-body problem, which has been called the greatest of all unsolved mathematical puzzles, has frustrated mathematicians since Newton, including such acclaimed intellects as Lagrange in the eighteenth century and Poincare in the nineteenth. The question is easy enough to pose: Given some astronomical bodies in gravitational contact--say the two bodies of Earth and the moon, or the three bodies of Earth, moon, and sun--find a mathematical expression that will describe their trajectories as the gravitational attraction between all the bodies pulls them around. Such an expression might provide a few equations into which you could plug a few numbers, perform a series of mathematical operations, and end up with the specifications of where all the bodies will be at a designated moment. The equations would be based on Newton's laws of motion, which describe how each of your bodies moves through space, and his law of gravity, which tells you how the force of gravity works on each.

"THE MACHINE I BUILT
cost a few thousand bucks,
was the size of a bread box,
and ran at a third the speed of
the fastest computer in the world
at the time. And I didn't need
anyone's permission to run it."

But knowing what an n-body problem entails does not imply that anyone has ever been able to solve one, with the modest exception of when n equals 2. If you're dealing only with Earth and the moon, for example, you can come up with equations that will precisely predict the orbits of the two bodies. When you move up to three bodies, the problem can be solved only if you have some excruciatingly simplified system: say, all three have to move in one plane as though rolling across an enormous billiard table, or the mass of one of them--the sun, for instance--is gargantuan compared with the mass of the other two. With four or more bodies interacting, the difficulty of the problem escalates, so to speak, astronomically.

There is one way to tackle n-body problems: "namely, by doing enough arithmetic," as the legendary physicist Richard Feynman put it. This is where the computer comes in. The computer simply simulates the evolution of the globular cluster. It begins with what a mathematician would call the initial conditions: in the case of a globular cluster, the number of stars, their masses, their positions in space when the simulation begins, and their direction and speed. It calculates the force of gravity between every two virtual stars, and for each star, adds up all the forces in all the various directions and calculates how that total force will alter the star's trajectory. The computer adjusts the velocities and trajectories of each of the stars accordingly, then repeats the calculations, over and over again, as a simulated clock ticks off the years.

A researcher might want to see how the globular cluster evolves as the calculation runs for a few million years or a few billion (in steps of years, or thousands of years, or even millions--the smaller the steps, the more precise the simulation). Given their starting points, where have the stars ended up? Then he might change the initial conditions, which, after all, come in an infinite variety of possibilities, and run the simulation again. Even for three bodies, says Hut, "if you change the initial conditions just a little, you get a very different answer."

What makes this approach to studying globular clusters so tricky is that the size of the n is not three or ten, but hundreds of thousands or millions, and the number of calculations needed to do a single simulation goes up by more than the third power of the number of bodies you want to simulate. Thus, simulating the evolution of a globular cluster of 100,000 stars would require 10 billion billion operations, as Hut pointed out in a 1988 paper he wrote with Steve McMillan of Drexel University and Jun Makino, then a student of Sugimoto's. Add ten times as many stars, and it requires more than a thousand times as many operations. And in the 1980s no computer yet existed that could complete the task in the lifetime of your average astrophysicist.

BY THE LATE 1980S,
supercomputers could run a billion operations a second, but they would still have required 10 billion seconds, or 300 years of computing, to simulate a modest-size (100,000 star) globular cluster. By the end of the millennium, there are likely to be a handful of supercomputers capable of running at a teraflop, or a trillion operations per second. These would still require several months of computing time to do a single simulation of a single 100,000-star globular cluster, and they would take centuries to do a globular cluster of a million stars. It would take a computer capable of a petaflop to handle a large enough variety of globular clusters in a reasonable length of time, says Hut, so that astrophysicists could begin to think they understand them.

That Sugimoto and his colleagues decided to build such a computer was due to a combination of circumstances and personalities. For starters, they were all somewhat unconventional in their work. Sugimoto, for instance, directed a laboratory at the University of Tokyo dedicated to a subject he called "general systems studies," investigating complex systems wherever they might be found. Hut says when he first visited Sugimoto in Tokyo, he walked into his office "and there were three people sitting there, one of them working on ecology, the other on traffic flow in Tokyo, the third on traffic flow in the galaxy. All three working under the same adviser. On the shelves were books about rain forests, books on traffic signals in Tokyo, and books on galactic dynamics."

Hut himself had studied his astrophysics in the Netherlands, where he wrote his dissertation on "a scattering of things" (a little black hole theory, a little Big Bang theory, some binary star work), stapling together the eight or nine articles he had already published by that point. He had toyed with the question of whether comets were responsible for the extinction of the dinosaurs and has since become something of a philosopher of science as well.

Then there was Makino, who as a 23-year-old graduate student in 1986 was already becoming an expert on n-body problems. Makino was one of the first to analyze exactly what part of the n-body calculations take up the most computer time and how that changes when more bodies are added. Hut says the two met because he was organizing a conference at Princeton on the subject of simulating stellar systems with supercomputers and had invited Sugimoto to give a talk. Sugimoto couldn't make it and instead suggested Makino. Hut was hesitant at first about inviting anyone that young and inexperienced to talk at a major conference. As for Makino, he had never been out of Japan before and is reticent by nature, notably so when he has to speak English. His talk went well, however, and Hut invited him to stick around the institute for a few weeks. By the end of that time, the two were collaborating on the problem confronting Sugimoto as well.

Hut and Makino spent part of 1987 in Cambridge, Massachusetts, working with Thinking Machines, the company that built the fastest supercomputers in the world at the time. Their experience was discouraging. The two astrophysicists, says Makino, quickly realized that even Thinking Machines' extraordinary computers wouldn't suit their needs. "At that time," he says, "we were just astrophysicists using computers. We still had no intention of building one."

By the following year that intention had changed, in part because of a conversation Sugimoto had with his friend Yoshihiro Chikada, an astrophysicist now at the National Astronomical Observatory of Japan. Chikada had built his own supercomputer to analyze data from radio telescopes, and when Sugimoto told him of the lack of the right computers to analyze his globular cluster problem, Chikada asked, "Why don't you build one yourself?" In the early 1980s, another friend of Hut's, an MIT computer scientist named Gerald Sussman, had built his own computer to simulate the solar system (which is a 10-body problem), and Hut and Makino spent a fair portion of their three months in Cambridge visiting him at MIT.

SUSSMAN'S MACHINE,
built out of spare parts and some chips donated by friends at Hewlett-Packard, was called the Digital Orrery, named after the mechanical contraptions that illustrate the orbits of the planets around the sun. Sussman had started building his digital version in 1983 for the same reason Hut, Sugimoto, and Makino began contemplating building a computer in 1988. "No computer in the world that was cheap enough was fast enough," says Sussman. "The machine I built cost a few thousand bucks; it would sit on my desk; it was the size of a bread box, and it ran at about a third of the speed of the fastest computer in the world at the time. I didn't have to ask permission from anybody to run it for six months, because I made it, and it was made for one problem."

Such a computer is called a special-purpose device, or SPD. SPDs differ from general-purpose computers, such as the one likely to be sitting on your desk, in how they process information. For the general-purpose machine, a program, or list of instructions, is fed into the computer's central processing unit, where the instructions are executed one by one. If you want to multiply two numbers, for instance, explains Makino, the processor is told it has to go to a specific place in the memory device, retrieve the numbers, load them into the particular part of the machine that does arithmetic, perform the multiplication, and put the result back in memory. All of this takes time, and most of the time the processor is sitting around waiting for the data to come or go from memory, or executing commands to make it come and go; in comparison, very little time is actually spent multiplying.

A special-purpose machine has no instructions. It is built to do one specific task and does it automatically. It doesn't need instructions, because it cannot do anything but the task for which it was built. In an SPD, the memory locations, the wires that connect the memory to the processors, and the processors themselves are all designed to do one calculation. The processor never has to find a number in memory, for instance, because that number is always waiting to be multiplied when the particular set of transistors is ready to do the multiplication. It's the computer equivalent of an automated production line to mass-produce a single item.

Hut and Makino realized that while a program written to do simulations of globular clusters might be tens of thousands of lines long, the bulk of the lines--maybe 98 percent of them--were delegated to doing special tasks, such as determining how the stars in the cluster evolve, or handling collisions between stars, or working out cases in which stars just barely missed each other or captured each other to become binaries or three-star systems. These calculations were all necessary to model the evolution of the globular clusters, but although they constituted most of the program, they didn't account for most of the time it took the program to run. The program was spending virtually all its time on the few tens or hundreds of lines of code that did the n-body calculation, figuring out the gravitational force between every two pairs of stars.

"As it turns out," says McMillan, who has been with the GRAPE project since 1989, "that force calculation is sufficiently simple that it can be rendered as an electric circuit, ultimately as a printed circuit. Those hundred lines of program are then replaced by a piece of hardware." That piece of hardware, which can be put on a single silicon chip, becomes the production line, or pipeline, which explains the computer's name. The positions and masses of two different stars (call them A and B) are fed into the circuit at the beginning. The chip then computes the distance between them and uses Newton's law to calculate A's acceleration due to B (which simply means how hard B is tugging at A) by dividing B's mass by the square of its distance from A. Instead of being stored in memory, the result is sent into a circuit that collects A's acceleration due to every other star in the cluster and uses that information to calculate A's new position. Meanwhile, using the same method, chips have been calculating the new positions of all the other stars. The whole process starts again from the beginning, with the new positions fed back into the pipeline. Of course, not every operation will be performed by that helpful chip--the computer will still need other kinds of circuits and memory to handle special cases. But the bulk of the calculations will use only the pipeline chips.

NOT ONLY WAS GRAPE IV
the first teraflop supercomputer ever
built, but it confirmed Sugimoto's
theory that globular cluster cores
oscillate like a beating heart.

In 1988 Sugimoto received a few thousand dollars in funding to build a pipeline-based computer, and Makino and a few of his fellow students started working on it. "In April 1989," says Hut, who spent that year on sabbatical in Tokyo, "they bought their first oscilloscope and were trying to figure out how to hook it up. They had no background. They really started from scratch." In six months, however, they had their first working prototype, a machine that could do 8 million force calculations a second. By 1991, several prototypes later, Sugimoto had used that success to get nearly $2 million for research, which covered them for the next five years and led to the GRAPE IV computer, designed primarily by Makino and his colleague Makoto Taiji.

GRAPE IV has nearly 1,700 identical pipelines, which means it can calculate the mutual attraction of 1,700 different pairs of stars simultaneously. Each pipeline then calculates one force interaction every three ticks of an internal clock, and that clock happens to tick 32 million times a second. So with 1,700 pipelines working simultaneously, GRAPE IV can do 18 billion force calculations every second, which is over a trillion arithmetic operations a second, and it can wrap up a simulation of a small (say, 50,000 star) globular cluster in only a month. (A 100,000 star cluster would still take several months.) Not only was GRAPE IV the first teraflop supercomputer ever built, but its simulations of small globular clusters confirmed Sugimoto's theory that the cores of these clusters oscillate. For Sugimoto, who will be retiring next year, this alone may have made the endeavor worth the effort invested.

OVER THE NEXT FEW YEARS,
Hut, Makino, McMillan, and their colleagues hope to raise the $10 million necessary to build the next version of GRAPE as an international project. This version is likely to be the first petaflop computer ever built and should be able to simulate all globular clusters, not just the small ones. Hut says the jump in speed will be easy, because the current GRAPE IV was designed conservatively, using old, relatively slow technology, to make sure it would work. By the year 2000 chips will be 100 times faster, and the petaflop barrier will be attainable simply by using ten times as many chips. "All the work was to get to GRAPE IV," says Hut. "To go from there to petaflops is a relatively small amount of extra work."

This is not the case with any other computing technology, which suggests that although GRAPE itself is capable of doing little more than simulating close encounters of the astrophysical kind, it may be a sign of things to come. For the past three years, the government's computer experts have been sifting through ideas about how to build their petaflop supercomputers and have narrowed the search down to a handful that seem to be worth pursuing. These range from a computer built with the fastest commercial technology, which might achieve petaflop speed by the year 2015 and cost in the neighborhood of $100 million, to a machine made of exotic superconducting technology and holographic optical memory devices, which could be built perhaps by 2008 and cost $60 million or so. Then there's the next version of GRAPE, which should hit petaflop performance by 2000 and cost no more than $10 million.

GRAPE is the only one of the computers being considered, says Hut, "that wouldn't need any miracles" to reach a petaflop. The miracle required by general-purpose computers is necessary to circumvent problems that arise because the speed of light is finite. As computer processors get faster and faster, the memory has to be placed closer and closer in the computer so that it can provide the requested data fast enough to keep the processor working. Imagine, for example, a processor that can do one calculation every 10 trillionths of a second, which is the fastest possible chip imaginable by known technology. All the data that processor might need would have to be placed within 3 millimeters, which is the distance that light can travel in 10 trillionths of a second. Nothing can travel faster. If the data are any farther away, then the processor sits around twiddling its silicon fingers while the data are coming and going, and the fact that the processor is very, very fast becomes irrelevant.

GRAPE gets around this problem by having the relevant data sitting and waiting for the processor rather than vice versa. To be sure, the nation as a whole has no need for a petaflop computer that can do nothing but astrophysical simulations. But GRAPE has another advantage besides not requiring a miracle. It is also so inexpensive that computers like it might possibly be designed specifically to handle other problems requiring extraordinary computer speed. If so, then those who need extremely fast computers in the future will simply build special-purpose, GRAPE-like machines. "We can identify 12 major problems that require petaflop speeds, ranging from national security through medicine, commerce, science, and engineering," says Caltech computer scientist Thomas Sterling, a member of the petaflop initiative.

For example, a petaflop computer might be a big help with medical diagnoses: a doctor could input a laundry list of symptoms and vital signs, which the computer would match with known illnesses. Petaflop computing could radically cut the development time for new planes, says Sterling, allowing engineers to model airflow so successfully that airplanes could be designed without scale models or wind tunnels. Demographic data could be processed to provide everything from global economic models to finely detailed models of consumer reaction to direct-marketing appeals or to a 5-cent hike in the price of cough drops.

"Each problem may be amenable to special-purpose approaches, like GRAPE, so we would build 12 separate machines," says Sterling. "That's a large number, but it's not unthinkable, especially if these machines are much lower in cost, and the performance they deliver would come as much as 15 years sooner than with any general-purpose machine."

This would bring computing back to where it started, says Hut. In the early days, before computers went digital, they were built for specific purposes, and that's all they did. "With the digital revolution," he says, "everything was digitized and everything became homogeneous and everyone had general-purpose computers. But now that you run into the speed of light barrier, the future may be special-purpose computing again. If you don't have time to communicate, you're just stuck. While special-purpose computers like GRAPE can march on and on."