December 04, 2008
SEATTLE, Dec. 4 -- Amazon Web Services LLC (AWS), a subsidiary of Amazon.com Inc., today launched “Public Data Sets on AWS,” providing access to a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all of AWS services, users pay only for the compute and storage they consume with their own applications. Data sets already available include various U.S. Census databases from the U.S. Census Bureau, 3-D chemical structures provided by Indiana University, and an annotated form of the Human Genome from Ensembl. More data sets will be available soon, including a wide range of economic statistics from the Bureau of Economic Analysis and additional scientific data sets.
Previously, large data sets such as the Human Genome and U.S. Census data required many hours to locate, download and customize. Now, anyone can access these large data sets from their Amazon Elastic Compute Cloud (Amazon EC2) instances and start computing on the data within minutes. By growing the number of people with access to important and useful data, and making it easy to compute on that data with cost-efficient services such as Amazon EC2, AWS hopes to fuel innovation and further accelerate the pace of new discoveries.
“For over five years, AWS has been working to lower the barriers to entry, level the playing field, and make it possible for our customers to be successful based on their ideas, not on their resources,” said Adam Selipsky, vice president of product management and developer relations for Amazon Web Services. “Public Data Sets on AWS is the latest of these efforts, and we can’t wait to see the discoveries and innovations that could stem from this ecosystem.”
Select public data sets are hosted on Amazon EC2 for free as Amazon Elastic Block Store (Amazon EBS) snapshots. Amazon EC2 customers can access this data by creating their own personal Amazon EBS volumes, using the public data set snapshots as a starting point. They can then access, modify and perform computation on these volumes directly using their Amazon EC2 instances and just pay for the compute and storage resources that they use. If available, researchers can also use pre-configured Amazon Machine Images (AMIs) with tools like Inquiry by BioTeam to perform their analysis.
“Public Data Sets on AWS will enable me and many of my colleagues to collaborate with each other by sharing our commonly used data sets, research environments and tools,” said Dr. Peter Tonellato from the Harvard Medical School. “We can set up a controlled environment in minutes, run our computational analysis for a couple of hours, and shut down the environment. Our results are completely repeatable. I only pay for the compute time I use, and more importantly I can spend more time focusing on research, not downloading and setting up computational infrastructure.”
"Bioinformatics is a hugely exciting area which is providing much insight into our understanding of biology and, particularly, the genetic basis of many human diseases like cancer and diabetes. The genome is a complex thing, however; it presents us with a potential source of invaluable information but also with great challenges in how to store, analyze and annotate it, and how to make both the raw genomic information and our annotations available to as many people as possible,” said Dr. Glenn Proctor, Ensembl software coordinator at the EBI. “Ensembl's approach has always been to try and lower the barriers to entry so that a researcher using a desktop PC in a lab or a laptop in an airport departure lounge has access to high-quality, up to the minute genetic information that they can use in their work. Amazon EC2 allows us to go even further and make all our data available in a robust, scalable and flexible form that anyone with an AWS account can use."
For more information about the Public Data Sets on AWS, to get started using a data set, or to submit a data set, visit aws.amazon.com/publicdatasets.
Amazon.com Inc., a Fortune 500 company based in Seattle, opened on the World Wide Web in July 1995 and today offers Earth's Biggest Selection. Amazon.com, Inc., seeks to be Earth's most customer-centric company, where customers can find and discover anything they might want to buy online, and endeavors to offer its customers the lowest possible prices. Amazon.com and other sellers offer millions of unique new, refurbished and used items in categories such as books, movies, music & games, digital downloads, electronics & computers, home & garden, toys, kids & baby, grocery, apparel, shoes & jewelry, health & beauty, sports & outdoors, and tools, auto & industrial.
Amazon Web Services provides Amazon's developer customers with access to in-the-cloud infrastructure services based on Amazon's own back-end technology platform, which developers can use to enable virtually any type of business. Examples of the services offered by Amazon Web Services are Amazon Elastic Compute Cloud (Amazon EC2), Amazon Simple Storage Service (Amazon S3), Amazon SimpleDB, Amazon Simple Queue Service (Amazon SQS), Amazon Flexible Payments Service (Amazon FPS), and Amazon Mechanical Turk.
Amazon and its affiliates operate websites, including www.amazon.com, www.amazon.co.uk, www.amazon.de, www.amazon.co.jp, www.amazon.fr, www.amazon.ca, and the Joyo Amazon Web sites at www.joyo.cn and www.amazon.cn.
10/30/2013 | Cray, DDN, Mellanox, NetApp, ScaleMP, Supermicro, Xyratex | Creating data is easy… the challenge is getting it to the right place to make use of it. This paper discusses fresh solutions that can directly increase I/O efficiency, and the applications of these solutions to current, and new technology infrastructures.
10/01/2013 | IBM | A new trend is developing in the HPC space that is also affecting enterprise computing productivity with the arrival of “ultra-dense” hyper-scale servers.
Ken Claffey, SVP and General Manager at Xyratex, presents ClusterStor at the Vendor Showdown at ISC13 in Leipzig, Germany.
Join HPCwire Editor Nicole Hemsoth and Dr. David Bader from Georgia Tech as they take center stage on opening night at Atlanta's first Big Data Kick Off Week, filmed in front of a live audience. Nicole and David look at the evolution of HPC, today's big data challenges, discuss real world solutions, and reveal their predictions. Exactly what does the future holds for HPC?