The basics of file systems
The computer market offers a huge variety of opportunities for storing information in the digital form. Depending on the requirements, one can choose between such storage devices as internal and external hard drives, SSDs, memory cards, USB flash drives, RAID sets along with other kinds of complex storages. But regardless of their individual specificities, they all just keep lots of data bits, and do not have any internal mechanism to determine where each individual bit should be placed at the given moment. And certain logic is required to organize those pieces into meaningful files, like documents, pictures, databases, etc. and easily retrieve them upon request. The following article provides a general overview of the major means of data management on any storage, known as the file system, and explains why there are different filesystem types.
What is a file system?
The file system (sometimes written as filesystem or abbreviated as FS) is a collection of methods and structures used by the computer’s OS for the arrangement of data on any digital storage device, as well as for the control over the vacant space. To better understand the essence of such a complicated technology, one might need to get acquainted with the basic principles behind it.
To begin with, any computer file is stored on a storage medium (e.g. HDD, SSD, thumb drive, etc.) that has a specific capacity. This storage can be viewed as linear space available for reading or both reading and writing digital information. Each byte of information on it has a particular offset from the storage start known as an address and can be referenced by its address. In this regard, a storage can be treated as a grid with a set of numbered cells (each cell is a single byte). And any item saved to the storage gets its own cells.
Traditionally, computer storages use the pair of a sector and in-sector offset to reference any byte of information on the storage. A sector is a group of bytes (usually 512 bytes) that serves as a minimum addressable unit of physical storage. For example, byte 1040 on a hard disk drive will be referenced as a sector #3 and offset in sector 16 bytes ([sector]+[sector]+[16 bytes]). This scheme is applied to optimize storage addressing and to use a smaller number to refer to any portion of information located on the storage.
To omit the second part of the address (in-sector offset), files are usually stored starting from the sector start and occupy whole sectors (e.g.: a 10-byte file occupies the whole sector, a 512-byte file also occupies the whole sector, at the same time, a 514-byte one occupies two entire sectors).
Each file is stored in "unused" sectors and can be read later by its known position and size. However, how does the OS know which sectors are occupied and which are free? Where are the size, position and name of the file stored? This is exactly what the filesystem is responsible for.
As a whole, the file system is a structured representation of data and a set of metadata describing this data. It is applied to the storage during the format operation. This structure serves for the purposes of the whole storage and is also a part of an isolated storage segment – a disk partition. Usually, it operates in blocks, not sectors. FS blocks are groups of sectors that optimize storage addressing. Modern filesystems generally use block sizes from 1 to 128 sectors (512-65536 bytes). Files are usually stored at the start of a block and take up entire blocks.
Besides the already mentioned functions, the FS is also in charge of:
-
Allocation of blocks for new files;
-
Assigning names and other important properties associated with a file;
-
Grouping files into directories;
-
Reading and writing to the contents of the existing files;
-
Performing files’ deletion.
Constant write/delete operations within a storage may cause its fragmentation. Thus, files are not stored as whole units, but get divided into fragments. For example, a volume is completely occupied by files with the size of about 4 blocks each (e.g. a collection of photos). A user wants to store one that will take up 8 blocks, and therefore deletes the first and the last files. By doing this, they release the space of 8 blocks, however, the first segment is located near to the storage start while the second one – to the storage end. In this case, the 8-block file is split into two parts (4 blocks for each part) and takes the free space "holes". The information about both fragments as its parts is stored in the file system.
In addition to the user's data, the filesystem also contains its own parameters (such as a block size), file descriptors (including their sizes, locations, fragments, etc.), names and directory hierarchy. It may also store security information, extended attributes and other properties.
However, filesystems are not all alike. They may differ significantly in their data organization strategies as well as such characteristics as performance, stability and reliability. Some of them may also be suitable only for specific applications.
Why do different filesystem types exist?
When formatting a storage medium, a user is often presented with multiple filesystem options that are referred to as types or formats. The number of variants may seem overwhelming, and naturally gives rise to a question why not having a single file system that would fit all occasions. Implementation of such a unified format would probably facilitate many things, but, unfortunately, is not viable in the current circumstances.
First of all, there is no FS that would be best suited to all kinds of needs. Each of them has its own strengths and shortcomings that should be taken into consideration, depending on the situation in which the media is being formatted. Some of them are optimal for general use, while others serve particular purposes or target a specific type of devices. Moreover, technologies evolve bit by bit, and newer filesystems become faster, more resilient, scale better to larger storage devices and acquire more advanced features than their predecessors.
Secondly, the filesystem is tightly linked to the operating system, and the latter plays a big part in the choice of an appropriate format. As a rule, each operating system tends to support its own set of filesystems, which the developers of this OS also work on. As a result, open-source environments offer a great number of filesystems while proprietary ones – only a few alternatives. To learn more about the common native formats of modern operating systems, please refer to the corresponding article:
-
The filesystems of Windows: FAT/FAT32, exFAT, NTFS, ReFS, HPFS
-
The filesystems of Linux: Ext2, Ext3, Ext4, XFS, Btrfs, F2FS, JFS, ReiserFS
There is also a special category of filesystems employed in distributed environments, like storage area networks:
Last update: April 19, 2023