Distributed file systems do not share
block level access to the same storage but use a network
protocol. These are commonly known as network file systems, even though they are not the only file systems that use the network to send data. Distributed file systems can restrict access to the file system depending on
access lists or
capabilities on both the servers and the clients, depending on how the protocol is designed. The difference between a distributed file system and a
distributed data store is that a distributed file system allows files to be accessed using the same interfaces and semantics as local files for example, mounting/unmounting, listing directories, read/write at byte boundaries, system's native permission model. Distributed data stores, by contrast, require using a different API or library and have different semantics (most often those of a database).
Design goals Distributed file systems may aim for "transparency" in a number of aspects. That is, they aim to be "invisible" to client programs, which "see" a system which is similar to a local file system. Behind the scenes, the distributed file system handles locating files, transporting data, and potentially providing other features listed below. •
Access transparency: clients are unaware that files are distributed and can access them in the same way as local files are accessed. •
Location transparency: a consistent namespace exists encompassing local as well as remote files. The name of a file does not give its location. •
Concurrency transparency: all clients have the same view of the state of the file system. This means that if one process is modifying a file, any other processes on the same system or remote systems that are accessing the files will see the modifications in a coherent manner. •
Failure transparency: the client and client programs should operate correctly after a server failure. •
Heterogeneity: file service should be provided across different hardware and operating system platforms. •
Scalability: the file system should work well in small environments (1 machine, a dozen machines) and also scale gracefully to bigger ones (hundreds through tens of thousands of systems). •
Replication transparency: Clients should not have to be aware of the file replication performed across multiple servers to support scalability. •
Migration transparency: files should be able to move between different servers without the client's knowledge.
History The
Incompatible Timesharing System used virtual devices for transparent inter-machine file system access in the 1960s. More file servers were developed in the 1970s. In 1976,
Digital Equipment Corporation created the
File Access Listener (FAL), an implementation of the
Data Access Protocol as part of
DECnet Phase II which became the first widely used network file system. In 1984,
Sun Microsystems created the file system called "
Network File System" (NFS) which became the first widely used
Internet Protocol based network file system.
Lustre, PanFS,
Google File System,
Mnet,
Chord Project.
Examples •
Alluxio •
BeeGFS (Fraunhofer) •
CephFS (Inktank, Red Hat, SUSE) •
Windows Distributed File System (DFS) (Microsoft) •
Infinit (acquired by Docker) •
GfarmFS •
GlusterFS (Red Hat) •
GFS (Google Inc.) •
GPFS (IBM) •
HDFS (Apache Software Foundation) •
IPFS (Inter Planetary File System) • iRODS •
LizardFS (Skytechnology) •
Lustre •
MapR FS •
MooseFS (Core Technology / Gemius) •
ObjectiveFS •
OneFS (EMC Isilon) •
OrangeFS (Clemson University, Omnibond Systems), formerly
Parallel Virtual File System •
PanFS (Panasas) •
Parallel Virtual File System (Clemson University, Argonne National Laboratory, Ohio Supercomputer Center) •
RozoFS (Rozo Systems) •
SMB/CIFS • Torus (CoreOS) •
VaultFS (Swiss Vault) •
WekaFS (WekaIO) •
XtreemFS ==Network-attached storage==