![]() ![]() ![]() |
|
|
||||||
|
GRID Computing - the next BIG thing? Grid computing is the latest stage in the evolution of supercomputers. As computers were applied to solve large, complex problems, these supercomputers, the powerful computers needed to do those jobs, were designed and built. This enabled solving larger and more complex problems, which needed more powerful computers. Multi-processor supercomputers, which utilized several processors on one machine, were then built to achieve the needed computing power. Eventually, it was found that common (cheap) PC-based processors, which had grown up to be as powerful as supercomputers of a decade ago, could be clustered together to form very powerful "super" computers at less cost. So much so that by the end of 2002, PC clusters would account for 2 of the top ten most powerful supercomputers in the world (at #5 is a Linux Networx MCR cluster of 2304 Xeon 2.4Ghz processors while at #8 is a HPTi Aspen System cluster of 1536 Xeon 2.2Ghz processors). A limitation of clusters is that these are dedicated facilities built in one location. Why not connect these computers in a network to make a more powerful computer? This is what Grid computing is about; it aggregates the processing power of hundreds of computers connected in a network. Unlike a cluster which depends on high-speed data links, Grid computing can make use of relatively low-speed data links. What It Is Grid computing is a form of distributed, large-scale cluster computing. Grid computing applies the resources of many computers in a network to a single scientific or technical problem that requires a large amount of computer processing or access to huge tracts of data. To do this, Grid computing makes use of software that can divide and farm out pieces of a program and data to several computers in a network which may be within a corporation or maybe an open, public, and global collaboration. Grid computing enables more cost-effective use of computer resources to solve problems that can't be approached without an enormous amount of computing power by applying the resources of a large number of computers harnessed cooperatively and managed to collaborate toward a common objective. Thus, it has the potential to provide tremendous computational power at a low cost. The Grid is named for the electricity grid, where all electric devices can plug into the system and access power resources regardless of where it is located. Ian Foster of the University of Chicago and colleague Carl Kesselman of the University of Southern California, generally credited for the Grid concept in their 1999 book The Grid, chose the analogy because "the electric power Grid provides access to power and, like the computer, has had a dramatic impact on human capabilities and society ... providing pervasive, dependable, consistent, and inexpensive access to advanced computation capabilities, databases, sensors, and people, computational Grids will have a similar transforming effect." Grid computing concepts were first explored in the 1995 I-WAY experiment, in which high-speed networks were used to connect computing resources at more than a dozen sites across North America. From this activity, a number of Grid research projects then developed the core technologies for "production" Grids in various scientific communities and technical disciplines. Examples of these are the European Data Grid, Particle Physics Data Grid and Grid Physics Network in Europe which will analyse data from nuclear physics experiments and the NSF's National Technology Grid and NASA's Information Power Grid in the US which are meant to serve university and NASA researchers; the Network for Earthquake Engineering Simulation Grid which will connect engineers with experimental facilities, data archives and computer simulation systems used to design better buildings, bridges and other structures. A well-known example of Grid computing in the public domain is the ongoing "Search for Extraterrestrial Intelligence" SETI@Home project in which thousands of people are sharing the unused processor cycles of their PCs in the vast search for signs of "rational" signals from outer space. Advances in communications technology has led to more decentralized approaches to utilize computing power. There are over 400 million PCs around the world, many as powerful as an early 1990s supercomputer, yet most are idle much of the time. Every large institution has hundreds of such PCs. Grid computing seeks to exploit these idle workstations and PCs to create powerful distributed computing systems with supercomputer capabilities and global reach. The opportunity to utilize idle computers has been recognized for some time. In 1985, Miron Livny showed that most workstations are idle two-thirds of the time and proposed a system to harness those idle cycles for useful work. Exploiting the multitasking ability of the Unix operating system and the connectivity of the Internet, the Condor system is now used extensively to harness idle processors in workgroups or departments for routine data analysis and solving advanced problems in mathematics. At the University of Wisconsin, for example, Condor regularly delivers 400 CPU days per day of "free" computing power to researchers there and elsewhere; more than they can get from any supercomputer center. The spread of the Internet and the proliferation of powerful PCs has made larger projects possible. In 1997, Scott Kurowski established the Entropia network to apply idle computers around the world to problems of scientific interest. In just five years, this network grew to more than 130,000 computers with an aggregate speed of over four teraflops per second, nearly equivalent in power to a top ten supercomputer. Among its achievements is the identification of the largest known prime number. The next big project in Internet computing was David Anderson's SETI@home project which required the help of several personal computers to work on analyzing data from the Arecibo radio telescope for signals that might indicate extraterrestrial intelligence. With a good mix of popular appeal and good technology required for effective Internet computing, SETI@home is now running on nearly half-a-million PCs and delivering 1,000 CPU years per day - the fastest special-purpose computer in the world. What It Does A distributed computing architecture consists of small software agents installed on a number of client systems and one or more dedicated distributed- management servers. There are also requesting clients which submit candidate job requests along with a list of the required resources. An agent running on a processing client detects when the local system is idle, notifies the management server that it is available for processing and requests an application package. The client receives an application package from the server, runs the package with its spare CPU cycles and sends the results back to the server. The application will run in the background, without affecting normal use of the computer. If the user of the local system needs to run his own applications, control is immediately returned and processing of the distributed application package ends. What It Brings The advantages of this type of architecture are processing power at low cost, efficient utilization of computing resources and expandability. The performance gain over typical enterprise servers is huge. In a case study of a commercial bank, it was found that computation time for a series of complex interest rate modeling tasks was reduced from 15 hours on a dedicated cluster of four workstations to 30 minutes on a Grid of around 100 desktop computers; processing 200 trades on a dedicated system took 44 minutes but only 33 seconds on a Grid of 100 PCs. Even supercomputer level processing power is attainable for a fraction of the cost. One of the most powerful computers, IBM's ASCI White, is rated at 12 TeraFLOPS and costs $110 million, while SETI@home currently gets about 15 TeraFLOPs and has cost about $0.5 million so far. Further savings comes from the minimal requirement for electrical power, environmental controls and extra infrastructure that a supercomputer requires and that distributed applications can be written common programming languages like C or C++. Grid computing allows more efficient use of existing system resources. Analysts have shown that up to 80 percent of the CPU cycles on a company's desktop computers are not used. Even servers spread across multiple departments are typically used inefficiently. Server and workstation obsolescence can be drastically delayed by allocating certain applications to a Grid of client machines or servers, which can consist of older desktop PCs and servers dedicated to distributed computing tasks. The ability to increase computing power as needed is also a great advantage of distributed computing. Despite their massive processing power, super computers are not very expandable once installed. A distributed computing installation is very extendable - simply add more computers to the Grid when needed. Those computers might be added from within or even from outside the organization. What It Computes Not all applications are suitable for distributed computing. An application with individual tasks that need access to huge data sets will be more appropriate for larger systems than individual PCs. If terabytes of data are involved, a supercomputer makes sense as communications can take place across the system's very high-speed backplane without bogging down the network. Server and other dedicated system clusters will be more appropriate for other slightly less data intensive applications. Neither would applications which require fast or near-realtime response. Generally, the most appropriate applications are those with "loosely coupled, non-sequential tasks in batch processes with a high compute-to-data ratio" and must avoid any overload of the network caused by sending large amounts of data to each client. For a distributed application using numerous PCs, the required data should fit very easilly in the PC's memory, with room to spare. The application should be capable of being partitioned into independent tasks which can be processed concurrently. The individual PC should be able to process tasks and small blocks of data and report results that, when combined with those from other PCs, produce coherent output. These tasks should be small enough to produce results within a few hours or a few days. Aside from the popular SETI@Home computing project, the following types of applications can take advantage of Grid computing: To enhance their public image and demonstrate the effectiveness of their platforms, most of the distributed computing vendors have set up pro-bono computing projects that recruit CPU cycles across the Internet, such as, Parabon's Compute-Against-Cancer which harnesses an large number of computers to track patient responses to chemotherapy, Entropia's FightAidsAtHome which evaluates prospective targets for anti-AIDS drugs. In The Future The major challenges crop up as the scale of Grid computing expands. As soon as the project extends beyond the bounds of a single corporate organization, security and standardization challenges become very significant. Most of today's vendors offer applications for use within the corporate firewall, although others are staking out the global Grid territory. Spearheading the development of open source Grid software is Globus, a project by researchers at the Argonne National Laboratory, the University of Southern California, the University of Chicago, and elsewhere. Globus software is designed from the start with global scale and scalability in mind. The Globus Toolkit has become the de facto standard for groups building Grids and developing Grid applications. The spirit is very collaborative, as the group has agreed to make everything done as "open source". Working out standards for communications among platforms is part of the typical early chaos that occurs in any new technology. In the peer-to-peer realm, the Peer-to-Peer Working Group, started by Intel, is looking to devise standards for communications among different types of peer-to-peer platforms. The Global Grid Forum, an aggrupation of about 200 companies, is looking to devise Grid computing standards. There are also vendor-specific efforts such as Sun's Open Source JXTA platform, which provides a collection of protocols and services to allow peers to advertise themselves and communicate with each other securely. Platforms that can easily integrate with existing security infrastructures yet can facilitate communications among them is necessary for widely distributed grid computing. For security, most of the current platforms make use of powerful encryption such as Triple DES. The application packages that are sent to PCs are digitally signed to make sure a rogue application does not infiltrate the system. Identical application packages are usually sent to more than one PC and the results of each are compared; any set of results that differs from the rest becomes suspect. Even with encryption, data can still be snooped while in process in the client's memory, so most platforms use application data chunks that are small enough that snooping them will not provide useful information. Big corporations such as Sun, IBM, and Microsoft are now pushing their support of Web services. These companies will, in the near future, integrate Grid protocols into their products and compete with better platform implementations, resource management, and storage capability. Sun Microsystems has launched its GridEngine, a tool which allows pooling of compute resources of up to 80 workstations. IBM has announced that it would "grid-enable" its entire product line and is joining with Globus to adopt the Open Grid Services Architecture, a proposal to industrialize the Globus Toolkit by integrating Web services and Grid technologies. Other Grid backers include Platform Computing and Entropia, with Intel, Hewlett-Packard/Compaq, Oracle, and others closely watching developments in the field. For today, while still in its infancy, the promise of Grid computing lies mostly in harnessing computer system resources within the confinement of a specific workgroup. It will take a few years before the systems on the Internet will be sharing computer resources as effortlessly as they can share information. -o0o- |