- Latency - How long does a single operation of the smallest fundamental unit take to complete.
- Throughput - How many bytes of data per second can we read or write
- Parallel scaling - When several requests are issued at the same, or nearly the same time, how are latency and throughput affected.
- Resilience to failure of storage components - How many and which types of storage hardware failures can be tolerated without the system availability being interrupted and how is performance affected.
- Resilience to failure of network components - If a network is partitioned or fails, does the storage system remain consistent and how are partially complete operations handled
- Semantics - What capabilities are offered to clients? Can existing files be re-written? Are there folders? Is random access possible? How are files locked and shared? Are there any transactional semantics?
- Location Transparency
![]()
image
![]()
image
![]()
image
- At the basis of almost any distributed system are the factors involved in local storage systems.
- The problems presented in local storage are simpler and less composed than in their distributed counter-parts.
In all but the most simple filesystems, there is a concept of a folder and a path.
In UNIX operating systems, folder entries are held within inodes that have the filetype in the mode set to type directory.
The contents of the inode then are a list of filenames and pointers to the inodes of those files and/or folders.
Resolving paths involve accessing a root folder, and accessing each folder recursively until reaching a file or finding the folder to be invalid.
An example of path traversal. When traversing paths, the path may cross into different filesystems.
- Flat file service
- implements operations on the contents of file
- UFID (Unique File Ids) used to refer to files
- new UFID assigned when file created
- Directory service
- provides mapping between text names and UFIDs
- Functions to create, update.. directories
- Client module
- runs on client computer
- provides APIs to access files
- holds information about network location of file server and directory server
- sometimes caching at client
- Upload/download model
- read/write file operations
- entire file transferred to client
- requires space on client
- Products like SkyDrive and DropBox work like this
- Remote Access Interface
- large number of operations
- seek, changing file attributes, read/write part of file…
- does not require space (as much) on client
- Key issue for distributed file system
- whether all clients have the same VIEW of the directory hierarchy
- Location Transparency
- path names give no hint as to where the files are located
- e.g., /server1/dir1/dir2/X indicates X located on server1 but NOT where server1 is located
- Problems? If X needs to be moved to another server (e.g., due to space) - say server2 - programs with strings built in will not work!
- Location Independence
- files can be moved without changing their names
- Three common approaches to file and directory naming
- Machine + path naming, such as /machine/path or machine:path (location dependent)
- Mounting remote file systems onto the local file hierarchy (location dependent)
- A single name space that looks the same on all machines (location independent)
A read is always provided with the last write (system enforces absolute time ordering)
- UNIX semantic can be achieved by
- read/write going to server
- no caching of files
- sequential processing by server
- BUT in distributed systems, this may perform poorly!
- How to improve performance?
- requires caching
- Modify Semantics?
- “changes to an open file are initially visible only to the process that modified the file. When file closes, changed become visible to others”
- Called Session Semantics
- Q What is the result of multiple (simultaneous) updates of cached file?
- final result depends on who closed last!
- one of the results, but which one it is can not be specified (easier to implement)
- Immutable Files
- can only create and read files
- can replace existing file atomically
- to modify a file, create new one and replace
- what if two try to replace the same file?
- what if one is reading while another tries to replace?
- Need to understand file usage (so that)
- implement common operations well
- achieve efficiency
Satyanarayan (CMU) of file usage pattern on UNIX
- How should the system be organized?
are clients and server different?
how are file and directory services organized?
- caching/no caching
- server
- client
how are updates handled?
sharing semantics?
stateful versus stateless
Lookup
- Stateless advantages
- Fault tolerance
- No OPEN/CLOSE calls needed
- No server space wasted on tables
- No limits on number of open files
- No problem if client crashes
- For example,
- each request self contained
- if server crashes - no information lost
- One of the most important design considerations
- impacts performance
- If caching – how should it be done?
- Server Disk
(+) most space
(+) one copy of each file
(+) no consistency problem
- (-) performance
- each access requires disk access –> server memory –> network –> client memory
- Server Memory
keep MRU files in server’s memory
If request satisfied from cache ==> no disk transfer BUT still network transfer
- Unit of caching? Whole files
- (+) high speed transfer
- (-) too much memory
Blocks + better use of space
- What to replace when cache full?
- LRU
Client Caching
- Disk
- slower
- more space
- Memory
- less space
- faster
Where to cache?
- User Address Space
- cache managed by system call library
- library keeps most heavily used files
- when process exits - written back to server
- (+) simple
- (+) low overhead
- (-) effective if file repeatedly used
- Kernel
(-) kernel needed in all cases )even for a cache hit)
(+) cache survives beyond process ( e.g., two pass compiler - file from first pass available in cache)
(+) kernel free of file system
(+) more flexible
- little control over memory space allocation
- e.g., virtual memory may result in disk operation even if cache hit
- client caching introduces inconsistency
- one or more writers and multiple readers
- Write-thru
- similar to between processor cache and memory
- when a block modified - immediately sent to server (also kept in cache)
- problem
- client on machine 1 reds file
- modify file (server updated)
- client on machine 2 reads and modifies files
- server updated
- another client on machine 1 reads file
- gets local copy (which is stale)
- solution: write-thru
cache manager checks with the server before providing file to client
- If local copy upto-date
- provide to client
Else get from server
RPC for check is not as expensive as file access
- Performance problems
- read is fine
- each write generates network traffic (very expensive)
- compromise - periodic updates (say 30 sec) of writes
- collected and sent to server
- eliminates writing of many scratch files completely (which otherwise would be written)
Note- semantics have changed for delayed writes
- Write-on-Close
session semantics
- wait (delay - say 30 sec) after close to see if file deleted
- in that case write eliminated
- Centralized Control
- File server keeps track of who has file and in what mode
- if new request for read - server checks to see if file opened read/write
- if read - grant request
- if write - deny request
- when file closed - inform others
- Many variations possible
- Multiple copies of files for
- increased reliability so no data is lost
- increased availability when one server is down
- improved performance through division of load
- send update to each file in sequence
- problem - if process updating crashes in the middle ==> inconsistent copies
- Primary Copy Replication
one server designed as primary
primary updated (changes made locally by primary server)
primary server updates secondary copies
reads can be done from any copy
- to guard against primary copy failure
- updates first stored on stable storage
But if primary copy down - No update can be made!!
Requires clients to acquire permission of multiple servers before reading/writing file
File replicated on N servers - to update client needs to contact majority , N/2 + 1 servers
File changed and new version no assigned
- To read - client contacts N/2 + 1 servers
- will always get the latest version
- No of replicas - N
- Read Quorum - Nr
- Write Quorum - Nw
- Constraints Nr + Nw > N
- Read/write requires participation of the corresponding quorum
- NFS - Network File System
- designed to allow an arbitrary collection of clients and servers to share a common file system
- Design Goals
- heterogeneity
- access transparency
- local and remote accesses look the same - e.g., normal UNIX system calls
- failure transparency
- stateless
- idempotent
- performance transparency
- client caching
- server caching
- Location Transparency
- client establishes file name space by adding remote file systems to local name space
- file system exported by servers (node holding it)
- file system remote-mounted by client
- Not Supported in Design Goals
- Replication transparency
- separate service for replication (NIS)
- Concurrency
- Naïve locking
- Scalability
- limited
- originally designed to support 5-10 clients
- VFS Layer
- maintains table with one entry for each open file
- entry called v-node (indicates whether local or remote)
- v-node points to I-node (for local files) and r-node (for remote files)
- Typical Operation
- Read
- locate v-node
- determine local or remote
- transfer occurs in 8K (normally) byte blocks
- automatic prefetching (read-ahead) of next block
- Write
- writes not immediately written to server
- 8K bytes collected before writing
- Caching
server caches (to reduce disk accesses)
- client maintains cache for
- for file attributes (I-nodes)
- for file data
- cache block consistency problems
- with each cache block is a timer
- entry discarded when timer expired
- when file opened- server checked for last modification of file
UNIX semantics not completely enforced