November 20, 2006
Evergrid Inc., a provider of quality of application service management for next generation datacenters, has announced its entry into the high performance computing market, with patent pending high availability and resource management software that lets massively parallelized distributed applications run at near 100 percent reliability on high performance computing (HPC) clusters.
The Evergrid software sits between the operating system and the applications, and captures the collective state of the application and its IO across all processors. By recording the state of the application, Evergrid is able to checkpoint and recover from failures rapidly with minimal overhead. The software also allows data centers to do preemptive scheduling of lower priority applications in favor of running higher priority applications, with little or no data lost. The software installs on Linux systems and requires no modifications to either the OS or application. It is scalable up to thousands of nodes at a time, with less than five percent performance overhead.
"As open source and commodity hardware have become de facto standards, large data centers today are increasingly deploying their mission critical applications on huge clusters of servers," said Ameet Patel, partner and CTO, Acartha Group, and former technology executive at JPMorgan Chase. "But traditional datacenter configurations are rigid, complex, underutilized and expensive. The market desperately needs a solution that treats commodity servers like we used to treat mainframes. Datacenters want to schedule high priority jobs on pools of commodity servers that can quickly recover from inevitable failures."
Despite a host of recent advances in hardware and software, downtime for compute intensive applications is an ever-worsening problem in high performance technical computing (HPTC) environments. Expanding clusters of commoditized servers has resulted in higher failure rates and lower mean time between failures (MTBF) because of the large number of nodes and the length of time users want to run parallel applications. Also, in an attempt to meet quality of service objectives, data centers have dedicated individual servers to particular applications, resulting in over-provisioning. Such a situation has created an environment of low utilization, poor reconfiguration flexibility and high cost.
"When we built System X at Virginia Tech we found that the reliability of large clusters was an important issue," said Srinidhi Varadarajan, CTO and founder of Evergrid. "Even with excellent hardware the runtime of large jobs was restricted by the mean time between failures of 1000's of processors. We decided very quickly that we needed to do something about system availability, and that was our impetus for founding Evergrid."
Evergrid's new fault tolerant software prevents downtime by automating the checkpointing, migration and recovery of applications, thus offering automatic failover across multiple nodes and tiers. With Evergrid, even failure of multiple processors does not stop an application from functioning continuously. In addition, Evergrid's efficient and robust management software provisions servers from bare metal up through the application and allows preemptive allocation of resources to high priority applications. This level of functionality allows quality of service objectives to be met. All this is done with complete transparency to the user.
"Evergrid provides commodity server clusters with the industry's first and only transparent, fault tolerant system, and also the first and only preemptive scheduler for distributed applications," said David Anderson, CEO of Evergrid. "Our product is truly massively scalable. The closest competitor can scale to only eight nodes with performance overhead of more than 40 percent. We designed Evergrid to grow to a remarkable 100,000 nodes or more."
Evergrid's infrastructure software is designed for demanding, computing intensive sectors such as aerospace, financial services and petrochemical research. Initially, Evergrid software solutions target high performance technical computing applications that are computationally intensive and use high speed interconnects. In the future, Evergrid will also provide solutions for the high performance enterprise computing (HPEC) market and transaction processing database markets.
Evergrid is a spin-off of California Digital, a company that created two of the highest performance supercomputers. The company is funded by a number of private investors, led by the Acartha Group.
Evergrid demonstrated its virtualization software technology last week at Supercomputing 2006 (SC06). Evergrid was one of only 54 show participants chosen to present poster submissions displaying emerging ideas and early results of advanced research in high performance computing, networking and storage.
10/30/2013 | Cray, DDN, Mellanox, NetApp, ScaleMP, Supermicro, Xyratex | Creating data is easy… the challenge is getting it to the right place to make use of it. This paper discusses fresh solutions that can directly increase I/O efficiency, and the applications of these solutions to current, and new technology infrastructures.
10/01/2013 | IBM | A new trend is developing in the HPC space that is also affecting enterprise computing productivity with the arrival of “ultra-dense” hyper-scale servers.
Ken Claffey, SVP and General Manager at Xyratex, presents ClusterStor at the Vendor Showdown at ISC13 in Leipzig, Germany.
Join HPCwire Editor Nicole Hemsoth and Dr. David Bader from Georgia Tech as they take center stage on opening night at Atlanta's first Big Data Kick Off Week, filmed in front of a live audience. Nicole and David look at the evolution of HPC, today's big data challenges, discuss real world solutions, and reveal their predictions. Exactly what does the future holds for HPC?