The Portland Group
Oakridge Top Right

Since 1986 - Covering the Fastest Computers
in the World and the People Who Run Them

Language Flags

Visit additional Tabor Communication Publications

Enterprise Tech
HPCwire Japan

Google Researchers Reveal Lessons Learned in Large-Scale Cloud Storage

Researchers from Google recently addressed this issue of availability in globally distributed storage systems, noting that while there is plenty of information about how components of storage systems fail, few have looked at the more positive side of the storage coin—overall availability for megacloud-based storage services.

The work is based on the results of a one-year study of Google’s main storage infrastructure. The authors note that “highly-available cloud storage is often implemented with complex, multi-tiered distributed systems built on top of clusters of commodity services and disk drives” and thus accordingly, “sophisticated management, load balancing and recovery techniques are needed to achieve high performance and availability amidst an abundance of failure sources that include hardware, software, network connectivity and power issues.”

To arrive at some of their conclusions, the authors put together a series of statistical models that look at different design choices, including variable replication and data placement choices. Using these models the researchers are able to examine availability against a number of system parameters that have been tested and encountered in Google’s fleet.

Among some key findings is that there’s a strong correlation “among node failures that dwarfs all other contributions to unavailability in our [Google’s] production environment.” This is in addition to the conclusion that “though disk failures can result in permanent data loss, the multitude of transitory node failures account for most unavailability.”

Full story at Full Study on ScribD

Most Read Features

Most Read Around the Web

Most Read This Just In

Most Read Blogs

Sponsored Whitepapers

Breaking I/O Bottlenecks

10/30/2013 | Cray, DDN, Mellanox, NetApp, ScaleMP, Supermicro, Xyratex | Creating data is easy… the challenge is getting it to the right place to make use of it. This paper discusses fresh solutions that can directly increase I/O efficiency, and the applications of these solutions to current, and new technology infrastructures.

A New Ultra-Dense Hyper-Scale x86 Server Design

10/01/2013 | IBM | A new trend is developing in the HPC space that is also affecting enterprise computing productivity with the arrival of “ultra-dense” hyper-scale servers.

Sponsored Multimedia

Xyratex, presents ClusterStor at the Vendor Showdown at ISC13

Ken Claffey, SVP and General Manager at Xyratex, presents ClusterStor at the Vendor Showdown at ISC13 in Leipzig, Germany.

HPCwire Live! Atlanta's Big Data Kick Off Week Meets HPC

Join HPCwire Editor Nicole Hemsoth and Dr. David Bader from Georgia Tech as they take center stage on opening night at Atlanta's first Big Data Kick Off Week, filmed in front of a live audience. Nicole and David look at the evolution of HPC, today's big data challenges, discuss real world solutions, and reveal their predictions. Exactly what does the future holds for HPC?


Stay informed! Subscribe to HPCwire email Newsletters.

HPCwire Weekly Update
HPC in the Cloud Update
Digital Manufacturing Report
HPCwire Conferences & Events
Job Bank
HPCwire Product Showcases


HPC Job Bank

Featured Events

HPCwire Events