Content

Overview of File Systems
File System Design
Sharing Files
Unix File System
Consistency & Crash Recovery
Journaling File Systems
Log-Structured File Systems (LFS)

Overview of File Systems

Abstraction for storing, organizing, & accessing persistent data
- Data organized as objects called files
- Files accessed via system calls

Open
- Program can keep the file description returned to imporve file access efficiency
Read/Write: r/w bytes from/to current position; update position
Seek: move to new position
Close
Create, rename, delete, get/set attributes

Open
Readdir: read entries in a directory
- No writedir to avoid dir metadata corruption
Seekdir
Close
Create, rename, delete, get/set attributes
Link/Unlink: add/remove a name for an existing file
- Hard link: different directory entries have same inode number
  - Inodes maintains reference count
  - Points to files not directories to avoid cycles (ensure directory deletion ok)
  - Becomes DAG
- Symbolic link (short cut): a file containing data of another file's name (redirect)
  - File type: shortcut (l)

File System Design

OS needs to:
- maintain information about files & directories
- store files durably
- handle machine crashes e.g. recover data

Disk Blocks

Disks are accessed at the granularity of sectors; a file system allocates data in blocks
- Reduces overhead managing individual blocks
- Increase internal fragmentation

File System Tasks

Free block management
- Bitmap: separate area on disk
  - Allows allocating contiguous blocks to file easily
  - Only need 1 bitmap block in memory at a time
  - Extra space for bitmap
Block allocation & replacement: maps (potentially non-contiguous) blocks to file
- Contiguous allocation
  - Good performance for sequential reading
  - File growth requires copying
  - Fragmentation; need periodic compaction
  - Good for CD-ROMS: file sizes known, files never deleted
- Linked list allocation: first word in block points to next block
  - Random accesses slow
- File allocation table (FAT): keep linked list information in memory
  - Random accesses faster
  - Entire table in memory, poor scalibility of file system
  - More efficient than i-node on small files
- I-node based allocation: tree to store index information
  - Allows growth of index information without spreading this information too much
  - Optimized for small files
  - Root of tree: inode
    - 12 direct block pointers
    - 1 indirect pointer
    - 1 double indirect pointer
    - 1 triple indirect pointer
```
                  disk partitions
|MBR||||  |                                   |    |    |
          /                                    \
         |Super|Inode |Block |Inode |File & dir|
         |block|bitmap|bitmap|blocks|blocks    |
```
  - Block placement
    - Problems
      - Data blocks scatter across disk in aging file systems -> long seeks
      - Inodes at beginning of disk, far from data -> going back and forth -> long seeks
  - BSD Fast File System
    - Disk partitioned into groups of cylinders
    - Superblocks placed in every group & in offset manner to recover from damage
    - Place these in same cylinder group:
      - Inode, data blocks in a file
      - Files in a directory
      - Place in nearby group if group full
Directory management: map file names to location of starting block of file
```
  |file name|file attributes|starting block #|
  ...

  # Unix
  |file name|i-node #| # i-node contains file attributes
  ...
```
- File names
  - Short, fixed length names
  - Variable length names
    - Size of directory entry variable
    - Options
      - Entries allocated contiguously:
        
        |len|name|inode|len|name|inode|...
        -> Inefficient for search; fragmentation
      - Allocate pointers to file names in the beginning of directory; heap at end to store names
        
        |p1|hash of name1| # fixed size |p2|hash of name2| ... p1-> |name1|inode| # variable size p2-> |name2|inode|name3|inode|
- File deletion
  - Directory entry removed from directory
  - All blocks in file returned to free list
- Path lookup: e.g. /D1/D2/F
  - Super block (locatin of inode blocks area; typically already cached in memory)
  - Inode of / -> data blocks of /
  - Inode of D1 -> data blocks of D1
  - Inode of D2 -> data blocks of D2
  - Inode of F (returned in open) -> data blocks of F
Buffer cache management: cache disk block in memory
- Operations
  - Block lookup
```
  # hash table to lookup blocks in memory

        key
  |device|block #| -> |disk block in memory| -> ...
  ...
```
  - Block miss
  - Block flush
- Issues
  - Limited size: need replacement algorithms
  - Competes with VM system about frames
    - Separate cache: inefficient use of cache if not doing file I/O
    - Unified cache: large file I/O affects performance of other programs
- Read ahead: prefetch next block from disk

Issues
- Concurrent access
  - Sequential consistency
    - A read call sees data from most recently finished write call
    - All processors see same order of writes
- Protection
  - Subject: who can access a file
  - Object: the file
  - Action: how can they access a file
    - read, write, execute, append, change protection, delete, etc.
  - Mechanisms
    - Access control lists (ACL): object -> subjects & actions
      - Easy to grant, revoke
      - Becomes large when heavily shared -> use groups
    - Capabilities: subject -> objects & actions
      - Easy to transfer
      - Hard to revoke

Unix File System

# In memory

Parent's fd table   Open file table
-----------         |File position|
-----------         |R/W          |
----------- ------> |Ptr to i-node|
...                 ---------------
                    |             |
Child's fd table  / |             |
-----------      /        ...
----------- -----
-----------
...

# Child can copy parent's fd table when forked

fd = open(name,mode)
- Path lookup: find inode of file
- Cache inode in buffer cache
- Check permissions
- Set up entry in open file table
- Set up entry in fd table
- Return fd
byte_count = read(fd, buffer, buffer_size)
- Figure out data/indirect blocks to read
- Read from disk into buffer cache
- Copy data to user buffer
- Update file position
- Return # of bytes read
byte_count = write(fd, buffer, num_bytes)
- Figure out data/indirect blocks to write
- Read from disk into buffer cache
- Copy data from user buffer to buffer cache
- Update i-node
- Mark modified buffers dirty (inode, free maps, indirect, data blocks, etc.)
- Schedule writing dirty buffers to disk
- Update file position
- Return # of bytes written
close(fd)

Mounting File Systems

File system namespace: set of names for all files in a file system
File system mounting glues a file system namespace into the namespace of another file system

Consistency & Crash Recovery

Deleting a Unix File

Remove file's directory entry in directory data block
Mark inode of file as free in inode bitmap; mark file blocks as free in block bitmap; update metadata in inode of directory

Crash in between leads to storage leak.

Switch steps. Crash in between leads to dangling pointer.

Reducing File System Inconsistency

Write-through metadata blocks
- Avoid dangling pointers
- Still inconsistent for the last file system operation before a crash
Write-back data blocks
- Most blocks are data blocks, improves file system performance
- Data blocks can be lost without affecting file system consistency
Crash recovery: restore file system consistency
- Full scan of file system to recover inconsistent states
- Long time since disk capacities increase faster than disk throughput
Avoid crash recovery:
- Use battery-backed RAM
  - Ensure enough power to write all dirty blocks to disk
- Failure atomicity: a file system operation either doesn't happen at all or happens completely
  - Undo recovery
    1. Copy old block on disk to spare block on disk (mark done)
    2. Copy new block in memory to block on disk
    3. Remove spare block
  - Redo recovery
    1. Copy new block in memory to spare block on disk (mark done)
    2. Copy new block in memory to old block on disk
    3. Remove spare block

Journaling File Systems

Write-ahead logging
- Write new versions of blocks in a journal (circular log)
- Commit when done
- File system then updates asynchronously
- Copy journal blocks to file system on crash where commit is present
Steps
1. Write blocks (e.g. B,I,D) to journal
2. Write commit block to journal
```
 |...|TransactionHeaderBlock|B'|I'|D'|Commit|

 # TransactionHeaderBlock contains block # of the following blocks
```
3. Install: copy updated B,I,D to file system
4. Free transaction in journal
Observation based on FFS
- As memory gets larger, need to optimize for writes
- Reads: read from buffer cache, less of performance problem
- Writes: becomes heavy for synchronous operations & data integrity
  - Also, writes are not well clustered i.e. directory, inodes, data are scattered

Log-Structured File Systems (LFS)

Write all file system data & metadata in a contiguous log.

LFS reads
- When inodes updated, written in log (scattered in log)
- Inode-map: an array in memory to locate inodes
  - Inode # -> inode location in log
LFS writes
- When inodes updated, inode-map has to be updated & stored on disk (inode-map itself stored in the log)
- Uses checkpoint region in a fixed area on disk
  - Locates inode-map blocks in log
  - Serves as a commit point
  - Region updated when inode-map blocks written
```
|superblock|checkpoint region|      segment      |segment|...|
                            /                     \
                           |segment|inode|inode|...|
                           |header |     | map |...|

# Checkpoint region -> inode map -> inode -> data
```
  - Keep 2 checkpoint regions to ensure failure atomicity on checkpoint region itself
Log space reclamation
- Reclaim segments when:
  - blocks overwritten
  - blocks deleted
  - live blocks have to be copied out of segments
Wear leveling: writing to SSD in log fashion
- Since each block in SSD has limited endurance, log fashion ensures writes to be spread evenly on disk

Comparison on Journaling File Systems & Log-Structured File Systems

Journaling

Motivation: speed up crash recovery
Benefit: speeds up crash recovery
Drawback: every block write becomes two block writes

Log-Structured

Motivation: reads will be absorbed by buffer cache, optimize for writes by issuing all writes sequentially
Drawback:
1. Reads not sequential if file created sequentially
2. Need garbage collection -> affect normal I/O performance
Writes: 1 or more; 1 for writing to log, more for cleaning process

Resources

Log-structured File Systems

File Systems

Content

Overview of File Systems

File System Design

Disk Blocks

File System Tasks

Unix File System

Mounting File Systems

Consistency & Crash Recovery

Deleting a Unix File

Reducing File System Inconsistency

Journaling File Systems

Log-Structured File Systems (LFS)

Comparison on Journaling File Systems & Log-Structured File Systems

Journaling

Log-Structured

Resources

results matching ""

No results matching ""

Content

Overview of File Systems

Basic File-Related Calls

Basic Directory-Related Calls

File System Design

Disk Blocks

File System Tasks

Sharing Files

Unix File System

File-Related System Calls

Mounting File Systems

Consistency & Crash Recovery

Deleting a Unix File

Reducing File System Inconsistency

Journaling File Systems

Log-Structured File Systems (LFS)

Comparison on Journaling File Systems & Log-Structured File Systems

Journaling

Log-Structured

Resources

results matching ""

No results matching ""