December 09, 2011
BRUYÈRES-LE-CHÂTEL, France -- NFS is a well known protocol that exports remote file systems since the late 80's. Along with its widespread use over more than 20 years, the protocol has evolved a lot from the very basic NFSv2 in 1989 to the brand new NFSv4.1 whose specifications were published by the IETF in early 2010. This last version of the protocol contains the pNFS feature which makes it possible to separate metadata path and data paths, as modern parallel file systems do, optimizing both data and metadata management by removing classical bottlenecks.
The Military Applications Department of the French Atomic Energy Authority (CEA/DAM) develops since 2005 a generic NFS server running in User Space and named “NFS-Ganesha”. In production at the TERA compute center since January 2006 and available as an Open Source Software on Source Forge since July 2008. This NFS server has evolved a lot during the past five years with many new features (mostly related to NFSv4 and NFSv4.1/pNFS) within successful collaboration with industrial partners.
More information about installing/configuring NFS-Ganesha can be found at http://nfsganesha.sourceforge/net.
You can download NFS-Ganesha at http://sourceforge.net/projects/nfs-ganesha/files/nfsganesha/.
A generic massively multithreaded server in User Space
NFS-Ganesha has many design differences when compared with classic NFS server provided with various Linux distributions. First of all, it runs in User Space. This presents several advantages:
NFS-Ganesha is a massively multithreaded program. At startup, a large number of workers (aka threads dedicated to process NFS requests) and dispatchers (threads that receive requests and assign them to one of the workers by pushing the request into the related worker's queue) are spawned. This fits pretty well with today's architecture where the machines tend to have more and more processors and cores (machines with up to 16 or 32 cores, 8 cores per socket are relatively common). Memory management is then a critical consideration to avoid the resource allocator serializing the threads. NFS-Ganesha embeds its own memory manager, based on the BuddyBlock algorithm used in the kernel. Each thread allocates a large chuck of memory and divides it into various combinations of smaller blocks (powers of 2). Once released, the blocks are freed by being pushed back to the buddy blocks pools and reassembled. This operation is made by setting pointers without moving data, making it very efficient and fast. The internal design itself is multi-thread safe and tries to avoid bottlenecks: normally, a lock should not be requested by more than two or three different threads. There is no “big lock” that would finally serialize the threads. Currently, NFS-Ganesha runs in production with hundreds of workers and thousand of dispatchers at CEA's TERA compute facility.
A third point is the architecture of NFS-Ganesha itself. The daemon has been designed for being a very layered product. From the highest layers (transport layers (including IPv6 support) and protocols layers (managing NFSv2/NFSv3/NFSv4/NFSv4.1 and 9P.2000L)) to cache and backend layers, each module has been defined as a standalone library with well known structure and API. The lowest layer is called FSAL (which stands for File System Abstraction Layer) and acts as a generic interface with the file system used as a a backend. Currently, several “FSALs” are supported:
More FSAL are coming in future releases:
A NFS server with interesting features for HPC
NFS-Ganesha was born because of needs coming from HPC's computing centers. Since the very beginning, it has been designed to fit the specific workload produced by supercomputers, making it possible to deal with HPC's specific issues. Multiple corporations (like IBM, Panasas and LinuxBox) with wealth of HPC experience joined hands with the NFS-Ganesha project and contributed towards the project improvement . While most of these corporations' contributions are aimed at providing the interface to their products, their participation greatly enhanced the stability, functionality and most importantly the reachability of NFS-Ganesha. This makes NFSGanesha a well adapted server to use NFS in front of “many clients” themselves part of a big compute cluster.
Another HPC oriented feature of NFS-Ganesha is pNFS. This feature is defined by the NFSv4.1's RFC (aka RFC5661). By using pNFS, a client can use different servers, some dedicated to metadata and others to data. This is very similar to what modern parallel File Systems (LUSTRE,Panasas, CEPH,...) do by separating Medata Servers from Data Servers. This makes pNFS a model that fit very well the classical model used in HPC. The specifications for pNFS are very open, but three layouts (layouts are structures and mechanisms used by the client to access the data on the Data Servers) are well defined in the RFCs. NFS-Ganesha currently implements the first of them, the “files layout” and has collaboration with the industry to enhance and debug it. The other layouts (based on the “block device” mode and based on the OSD2 protocol) are part of the NFS-Ganesha roadmap with already established partnership. CEA's position is simple : soon (this is already the case in Fedora 15) every linux based machine will embed the NFSv4.1 features, including pNFS. If the NFS server is pNFS ready and if the exported namespace has itself parallel capabilities, then using pNFS will be a natural and portable way to access data in a parallel way, strongly enhancing the IO rate via NFS. NFS-Ganesha supports pNFS since its version 1.1.0 . Many improvements will be done in this area in future releases.
About the CEA
The French Alternative Energies and Atomic Energy Commission (CEA) leads research, development and innovation in four main areas: low-carbon energy sources, global defense and security, information technologies and healthcare technologies. The CEA’s leadership position in the world of research is built on a cross-disciplinary culture of engineers and researchers, ideal for creating synergy between fundamental research and technology innovation. With its 15,600 researchers and collaborators, it has internationally recognized expertise in its areas of excellence and has developed many collaborations with national and international, academic and industrial partners.
Information about HPC at CEA can be found at http://www-hpc.cea.fr/index-en.htm
Source: French Alternative Energies and Atomic Energy Commission (CEA)
10/30/2013 | Cray, DDN, Mellanox, NetApp, ScaleMP, Supermicro, Xyratex | Creating data is easy… the challenge is getting it to the right place to make use of it. This paper discusses fresh solutions that can directly increase I/O efficiency, and the applications of these solutions to current, and new technology infrastructures.
10/01/2013 | IBM | A new trend is developing in the HPC space that is also affecting enterprise computing productivity with the arrival of “ultra-dense” hyper-scale servers.
Ken Claffey, SVP and General Manager at Xyratex, presents ClusterStor at the Vendor Showdown at ISC13 in Leipzig, Germany.
Join HPCwire Editor Nicole Hemsoth and Dr. David Bader from Georgia Tech as they take center stage on opening night at Atlanta's first Big Data Kick Off Week, filmed in front of a live audience. Nicole and David look at the evolution of HPC, today's big data challenges, discuss real world solutions, and reveal their predictions. Exactly what does the future holds for HPC?