What is Clustering?
The concept of clustering has been actively used for the past decade to provide high performance and high availability computers. Referring to the procedure of connecting more than one computing unit together across a network to provide a parallel processing environment. This provision allows the distribution of processes across the computers.
Beowulf Supercomputers - The origins of clustering.
Compute Clusters for High Performance - suitable for modelling, rendering and compiling.
High Availability Clusters - suitable for database replication servers or website serving.
Beowulf Supercomputers
The term Beowulf Supercomputer was coined in the mid 90's by Thomas Sterling and Don Becker at NASA-Goddard, describing a dedicated cluster constructed from commodity hardware and running open source software. There are two classes of Beowulf, a Class I describes a cluster based completely on commodity hardware, making use of standard technology generally far less expensive than Class II clusters which utilise specialised hardware to improve performance.
Beowulfs are designed with High Performance in mind, originally the concept was to provide supercomputing power at commodity prices, but now clusters dominate the Top500 Top500.org rankings - a list of the Top 500 supercomputers in the world.
Clusters for High Performance
High Performance computing clusters designed for intensive processing applications provide high numbers of computing cycles.
Message Passing and Shared Memory
There are two types of High Performance Computers competing in the HPC market, message passing clusters and shared memory machines. Message passing clusters make use of networks such as Gigabit Ethernet, Infiniband and Myriad to provide interprocessor communication. Shared memory generally provides many processors which have direct access to memory and disks, i.e. they are on the same board. Shared memory machines may also be clustered to improve performance. Message passing is a significantly cheaper alternative to using shared memory machines and can provide similar levels of performance.
Distributed Processing
There are numerous versions of message passing implementations available.
- One of the earliest is Parallel Virtual Machine (PVM).
- Message Passing Interface (MPI) is a more modern implementation
- Then MPI-CH is another version
- Local Area Multicomputer (LAM)
Many of these are designed to work best with C++ and Fortran however there are other implementations for different languages, such as Java and C#.
Distributed Rendering
Distributed Rendering can significantly speed up the production of complex graphics and animations. Rendering is one of the most scalable processes, double the number of nodes and you double the speed. There are two straight foreward methods of distributed rendering. Splitting up a single scene into smaller parts and rendering in parallel and rendering a animation in parallel.
Read about Render Farm Architecture >>
Distributed Compiling
Distributed compiling is important when writing large programs or recompiling a program for a different architecture.
Read about Compile Farm Architecture >>
Clusters for High Availability
High Availability clusters provide complete redundancy for maximum uptime and high load situations. They have become invaluable in the past 5-10 years in providing large websites with the capacity to deal with millions of visitors a second.
Database replication
Most large dynamic websites have realtime content creation regularly to an individual users specification, this can put a significant load on database servers. By using a replication service it is possible to spread the load across multiple machines and hence provide a responsive database backend.