What is ext4 file system?

The ext4 file system is a journaling file system that is used by default in most Linux distributions today. It is the successor to the ext3 file system and was first introduced in Linux kernel 2.6.28 in December 2008. ext4 aims to provide better performance, reliability and storage limits compared to ext3.

What are the key features of ext4?

Some of the key features and improvements in ext4 over ext3 include:

  • Supports volumes and files up to 1 exbibyte (EiB) with a maximum file size of 16 tebibytes (TiB)
  • Faster file system checking using delayed allocation, fast fsck and multithreaded fsck
  • Improved inode structure to speed up directory operations
  • Uses extents instead of block mapping for faster and more efficient allocation and storage
  • Persistent preallocation to reduce file fragmentation
  • Unlimited number of subdirectories (was limited to 32000 in ext3)
  • Backward compatibility with ext3 and ext2
  • Improved journaling with checksumming to improve reliability
  • Improved timestamps with nanosecond resolution

How does ext4 improve performance over ext3?

There are several ways ext4 provides better performance compared to the older ext3 file system:

  • Faster file system checks – ext4 reduces time spent on fsck by using delayed allocation, fast fsck capability and allowing multithreaded fsck.
  • Extents – ext4 uses extents instead of block mapping to allocate files. This improves contiguous allocation and reduces fragmentation.
  • Persistent preallocation – ext4 can preallocate disk space for a file so the file data blocks are contiguous improving performance.
  • Improved directory operations – ext4 uses htree indexing for directories reducing disk reads.
  • Multiblock allocation – ext4 can allocate multiple blocks to a file at once instead of one block at a time.
  • Improved inode structure – more efficient inode structure speeds up directory operations.
  • Unlimited subdirectories – no more directory restriction of 32000.

Overall, ext4 reduces bottlenecks caused by file system checks, directory lookups, storage allocation and directory limits in ext3 leading to better performance.

What are extents in ext4 and how do they improve performance?

Extents are contiguous physical storage units in ext4 file system used for allocating file storage. Each extent represents a contiguous storage range in the form of start_offset and length.

For example, a file allocated with 3 extents may look like this:

  • Extent 1: Start block 64, Length 4096 blocks
  • Extent 2: Start block 8192, Length 1024 blocks
  • Extent 3: Start block 9216, Length 512 blocks

Extents provide the following benefits over traditional block mapping:

  • Reduced file fragmentation – files are allocated in contiguous extents improving performance.
  • Faster file allocations and lookups – entire extent can be located quickly instead of mapping each block.
  • Metadata efficiency – an extent requires just start offset and length instead of mapping each block.
  • Large file support – large files spanning Terabytes can be allocated without block mapping overhead.

By using extents, ext4 eliminates the need to map each individual data block making allocation and access more efficient especially for large files.

What are the limits of ext4?

Here are some of the key limits in the ext4 file system:

  • Maximum file size – 16 TiB
  • Maximum file system size – 1 EiB
  • Maximum number of subdirectories – unlimited (was 32000 in ext3)
  • Maximum number of links to a file – 65000
  • Maximum blocks per group – 65,536
  • Maximum number of groups – 65,536
  • Maximum number of inodes – unlimited (was around 200 million in ext3)

The huge limits like maximum file system size up to 1 exabyte allows ext4 to scale for even the largest storage systems available today.

What are the new features in ext4?

Here are some of the major new features introduced with ext4 file system:

  • Extents – Replaces block mapping with extents for faster contiguous allocations.
  • Persistent preallocation – Files can be preallocated on disk to reduce fragmentation.
  • Delayed allocation – Data block allocation can be delayed until write time improving performance.
  • Unlimited subdirectories – No limit on number of subdirectories under a directory.
  • Multiblock allocation – Allocate multiple blocks to file at once instead of one block at a time.
  • Bigger file sizes – Maximum file size of 16 TiB and maximum file system size of 1 EiB.
  • Faster fsck – Multithreaded fsck, shared ext4 bitmaps and other improvements.
  • Improved timestamps – Timestamps up to nanosecond precision instead of seconds.
  • Filesystem barriers – For applications that require higher write ordering guarantees.

These new features provide higher performance, scalability, reliability and enable new large dataset applications on ext4 file system.

How does journaling work in ext4?

Ext4 uses journaling to provide reliability features like recoverability after crashes or power failures. Here is how journaling works in ext4:

  • The journal stores metadata updates like inode changes, file additions, deletions etc in a sequential log.
  • First the journal commit writes update info to the journal.
  • Then actual file system is updated. This ensures file system is updated only after commit.
  • Finally a journal commit marks the update safe in the journal.
  • On crash or failure, file system can replay the journal to restore consistency.

Some key aspects of ext4 journaling:

  • Journal kept in a separate area to prevent corruption.
  • Checksumming added to improve journaling reliability.
  • Journal commits batched for higher throughput.
  • Journal replay done during fsck instead of at mount time.
  • Journal size can be configured during file system creation.

Overall journaling in ext4 provides efficient recovery after crashes and power failures protecting file system integrity.

What are the different ext4 journal modes?

Ext4 supports multiple journal modes which provide different tradeoffs between performance and reliability:

  • journal – Full data+metadata journaling. Provides highest reliability but lower throughput.
  • ordered – Only metadata journaled. Data writes ordered via barriers. Good reliability.
  • writeback – Only metadata journaled. Data writes can be asynchronous. Lower reliability but good performance.

The default journal mode in most Linux distributions is ordered mode – it journals only metadata but guarantees write ordering via barriers providing good reliability without compromising performance.

Journal Mode Comparisons

Mode Data Journaling Write Ordering Reliability Throughput
journal Yes Yes Highest Lower
ordered No Yes High Good
writeback No No Lowest Highest

How to create an ext4 file system?

There are two main steps to create an ext4 file system on a storage device:

  1. Partition the storage device – Use fdisk or other partitioning tools to create a Linux partition.
  2. Make the file system – Use mkfs.ext4 to build ext4 file system within the partitioned area.

For example, to create ext4 on /dev/sdb1:

# fdisk /dev/sdb
# mkfs.ext4 /dev/sdb1

mkfs.ext4 also allows configuring various file system properties like:

  • Block size – Via -b or –block-size, default 1024 bytes
  • Label – Via -L or –label , sets file system volume label
  • Journal size – Via -J or –journal-size, default max 4% of partition size

For example, to create ext4 with 2 MB block size and 2 GB journal on 16 GB /dev/sdb1:

# mkfs.ext4 -b 2048 -J size=2G /dev/sdb1

How to tune ext4 file system performance?

Here are some ways to tune ext4 file system for better performance:

  • Use appropriate block size – match to expected IO workload, larger is better for large IO.
  • Choose ordered/writeback journal mode if data integrity is not super critical.
  • Use noatime mount option to prevent unnecessary inode writes for read accesses.
  • Increase readahead size for mostly sequential read workloads via blockdev or mount options.
  • Disable/reduce fsck frequency for read-mostly workloads if data integrity allows.
  • Spread data workload across multiple directories if doing lots of small file IO.
  • Enable barriers via mount option if applications require crash consistency.

Tuning should be based on monitoring workload patterns and understanding tradeoffs between performance vs integrity/crash consistency.

What are the advantages of ext4 over ext3?

Here are some of the key advantages ext4 provides over the older ext3 file system:

  • Higher throughput – ext4 eliminates ext3 bottlenecks improving performance for common workloads.
  • Larger volumes – ext4 supports huge volumes up to 1 exabyte and large files up to 16 tebibytes.
  • Reduced file fragmentation – extents and preallocate minimize file fragmentation.
  • Faster file system checks – multithreaded fsck, delayed allocation improve fsck times.
  • Reliability – ext4 adds checksumming to journal to improve reliability.
  • Scalability – more inodes, unlimited subdirectories remove scalability issues.
  • Easy transition – ext4 is backward compatible with ext3/ext2 using same on-disk format.

In summary, ext4 builds on ext3 foundation by addressing scalability, reliability limits while also improving performance and reducing storage overheads via extents and delayed allocation.

What are the disadvantages or limitations of ext4?

Some of the disadvantages or limitations of using ext4 include:

  • No checksumming of data – only metadata in journal is checksummed for error detection.
  • Limited crash consistency guarantees – only ordered mode provides full consistency.
  • No snapshot capabilities – LVM or other solutions needed for snapshots.
  • No built-in encryption – third party tools like eCryptfs required for on-disk encryption.
  • No compression – requires add-on compression tools.
  • First 48 bits of size stored in inode limits maximum size – workaround needed for > 16 TiB.
  • No native boot support – requires separate /boot partition.

While ext4 improves on ext3 in many ways, it lacks some features like compression, encryption, snapshots that are available in more modern filesystems.

When to use ext4 and when to avoid it?

Ext4 is well suited for use cases like:

  • General purpose Linux system deployment – it’s the default choice for most distros.
  • High throughput workloads – speeds up file intensive workloads vs ext3.
  • Larger filesystems and big data – can scale up to exabyte filesystem sizes.
  • Performance sensitive applications – improves boot, application load times.
  • Media collections or shared storages – large file support, improves reliability.

Use cases where other filesystems may be more suitable:

  • Extreme crash consistency needed – use journaled filesystem instead.
  • On-disk encryption required – use eCryptfs layered filesystem.
  • Boot partitions – better to use native bootable filesystems.
  • Embedded systems with flash – F2FS optimized for this.
  • Deduplication or compression required – ZFS, Btrfs offer these.

For Linux systems where ext4 strengths match the use case, it usually makes sense as the default file system choice.

Conclusion

The ext4 file system provides an evolutionary improvement over ext3. It scales to huge storage sizes with improved performance and reliability while maintaining easy compatibility with ext3/ext2. Features like extents, delayed allocation, unlimited subdirectories, faster fsck help ext4 overcome limits with earlier Linux file systems. While relatively newer file systems like Btrfs and ZFS offer more advanced features, ext4 continues to deliver as a mature, well-rounded choice for general Linux usage.