December 15, 2006
It's a fact you probably learned in elementary school: more than 70 percent of the earth's surface is water. Lakes, ponds, streams, rivers, oceans, and seas form a vast, complex system that is both affected by and profoundly affects our lives. Rising populations and rapid urbanization raise concerns about maintaining adequate supplies of potable water; agriculture and industry rely on water and also produce pollution that alters ecosystems; and in countless other ways water is at the heart of challenging environmental, economic, and social dilemmas.
CLEANER, the Collaborative Large-Scale Engineering Analysis Network for Environmental Research, has been launched by the National Science Foundation to address the challenges of understanding this complex world of water. The goal of CLEANER is to bring together sensors, data management and mining techniques, and modeling to enable scientists and engineers to collect, integrate, and analyze data and to better collaborate and share information regardless of geographic boundaries.
"To study and understand the state of water in the United States, we will have to use many types of data that have different scales, units, formats, quality, and levels of uncertainty," explains NCSA's Barbara Minsker, principal investigator of the CLEANER Project Office, which is charting the way forward for the complex, multi-year project. "Creating cyberinfrastructure that helps our community find, obtain, transform, analyze, and assimilate these data into many types of models is both a great challenge and a great opportunity."
NCSA is taking a lead role in developing a cyberenvironment to support both CLEANER research and the CLEANER planning process. Cyberenvironments integrate distributed computing and data resources–including scientific and engineering applications, graphical user interfaces and portals for easy interaction with the applications, workflow and collaboration software, integrated data analysis and visualization capabilities -- into end-to-end scientific processes, providing a boost in productivity.
Components of the cyberenvironment
NCSA is developing four integrated prototype technologies to demonstrate the potential power of a cyberenvironment to support CLEANER and other environmental projects.
The four components of this prototype cyberenvironment are:
1. The CyberCollaboratory, a Web portal to allow sharing of resources, models, data, and ideas. Using the CyberCollaboratory, individuals who are separated by geographic distance can collaborate in a common digital lab, sharing knowledge and information, analyzing data, and solving problems.
"The CyberCollaboratory provides an infrastructure for integrating different services," explains Yong Liu, a senior research scientist at NCSA and one of the CyberCollaboratory developers.
From this portal, researchers can access tools and data, such as an oil spill simulator that uses data housed at the Shoreline Environmental Research Facility in Texas or the Streamflow Analyst system developed at Utah State University. In addition, the CyberCollaboratory provides communication tools -- including chat services, message boards, document repositories, and videoconferencing–that allow distributed teams to work as seamlessly as those sharing the same physical space.
The CyberCollaboratory is already being used by several communities, including researchers studying the Neuse River in North Carolina and scientists investigating coastal waters. The CyberCollaboratory also is used as part of the CLEANER planning process; during the CLEANER all-hands meeting in March 2006, more than 200 users logged into the cybercollaboratory to share documents, discuss drafts, chat about planning issues, etc.
"We're definitely using the CyberCollaboratory on a daily basis," says Jami Montgomery, executive director of the CLEANER project office.
2. The CyberIntegrator, which provides a mechanism for easy integration of heterogeneous software tools to support modeling and analysis of complex environmental systems. Workflows execute a sequence of tasks on one or several local or remote processors – for example, obtaining data from remote sensors, transforming data to prepare it for analysis, performing analysis or modeling, or visualizing results. Meta-workflows allow heterogeneous workflows and software tools, often created by different users using multiple software technologies, to be linked and executed within a user-friendly, interactive system while using all available computational resources.
"The idea behind meta-workflow is that most workflow tools expect you to use a single tool, but in our community, people are already using multiple tools," Minsker says. The CyberIntegrator allows researchers to bring together heterogeneous tools in a single interface.
3. The Metadata Repository, which stores information on the activities in each component of the cyberenvironment; the system constructs statements relating the names, datasets, tools, and documents as subjects and objects that are captured as provenance graphs. This information can then be used by other tools to provide coordination, alerts, subscriptions, and knowledge networking.
4. CI-Know (Cyber Infrastructure Knowledge Networks on the Web), a tool that supports social and knowledge networking. CI-KNOW mines the information captured in the metadata repository to develop customized recommendations for users, guiding them to people, documents, data, images, tools, and workflows that might be helpful to them. (see "NCSA builds social networking tools" in this issue of Access).
As development of the cyberenvironment continues, an Event Broker will be integrated with these technologies. This will allow events in the cyberenvironment (such as the acquisition of new data from certain sensors or with certain values) to trigger execution of specific meta-workflows (such as real-time modeling).
Cyberenvironment in action
CLEANER researchers across the country are already envisioning how the cyberenvironment will enable new avenues of research.
For researchers monitoring hypoxia (oxygen depletion) in Corpus Christi Bay, for example, it's currently impossible to adapt their efforts to unfolding events. Manual sampling should be increased when the possibility of hypoxia is high, but the researchers cannot integrate the diverse sensor data (some downloaded only once a week) and models to predict when they should send people into the field to collect samples.
To address this need, the water research cyberenvironment is being developed to enable near real-time adaptive monitoring. The CyberCollaboratory will alert researchers when hypoxic conditions are expected. Scientists could then discuss the predictions using the portal's chat and message board features, developing a plan to step up their data-gathering efforts. The data collected from the manual sampling effort could then be transmitted back to the cyberenvironment's data store, perhaps triggering simulations and models via the CyberIntegrator. And then these results could also be discussed through the CyberCollaboratory.
"This type of end-to-end system will create a new paradigm for environmental research," says Minsker, "allowing interdisciplinary teams to collaborate to address complex issues."
This research is supported by the National Science Foundation and the Office of Naval Research.
For further information visit http://cleaner.ncsa.uiuc.edu.
Article provided courtesy of NCSA
10/30/2013 | Cray, DDN, Mellanox, NetApp, ScaleMP, Supermicro, Xyratex | Creating data is easy… the challenge is getting it to the right place to make use of it. This paper discusses fresh solutions that can directly increase I/O efficiency, and the applications of these solutions to current, and new technology infrastructures.
10/01/2013 | IBM | A new trend is developing in the HPC space that is also affecting enterprise computing productivity with the arrival of “ultra-dense” hyper-scale servers.
Ken Claffey, SVP and General Manager at Xyratex, presents ClusterStor at the Vendor Showdown at ISC13 in Leipzig, Germany.
Join HPCwire Editor Nicole Hemsoth and Dr. David Bader from Georgia Tech as they take center stage on opening night at Atlanta's first Big Data Kick Off Week, filmed in front of a live audience. Nicole and David look at the evolution of HPC, today's big data challenges, discuss real world solutions, and reveal their predictions. Exactly what does the future holds for HPC?