ScoutFS is a new open source scalable clustered file system designed and implemented for archiving large data sets to low cost external storage resources such as tape, disk, object, and cloud. Versity uses ScoutFS to power its subscription archiving application called Scout Archive Manager (ScoutAM). In order for the the ScoutAM archiving application to perform work, it must be able to discover large numbers of changes within the file system quickly and efficiently. Legacy archive systems are unable to scale beyond relatively low numbers of files partly due to a reliance upon file system scans implemented through standard POSIX interfaces. The elimination of file system scans was a critical design goal for ScoutFS.
From edge triggered to level triggered
To highlight the power of the new ScoutFS query interface, let’s consider the current generation file system. The edge triggered Versity file system makes the archive workflow scalable by creating event notifications. File system changes are organized into events that are sent to a list of files that need to be examined saving an entire file system scan for each pass of the archive application.
But occasionally events can be lost, resulting in the need for a full metadata scan. Here’s what happens: events are buffered in a fixed size ring buffer. If the archive application is not running or is not sampling the file system fast enough, the ring buffer can overflow and lose events. The archiver gets a flag signaling the overflow, but the lost events can not be recovered. When the application detects the overflow it has to fallback to a full metadata scan.
The new ScoutFS architecture is able to fully support a system with up to one trillion files. A level triggered design lets the archive application query into the time ordered transaction sequence of modified inodes starting at any arbitrary location within the sequence. The transaction sequence is a monotonically increasing count over time that will never be reused. Once the archive application completes work for a given inode in a given sequence, it knows that nothing can change earlier in the sequence. If a file is modified later on, it will show up again in the sequence with a new, higher transaction sequence number. In this architecture, the application only needs to record where it left off in the sequence, and only ever needs to start over at that location.
Indexed attributes
ScoutFS stores the items for the query interface in a modified log structured merge tree. The details of the structure and interactions are detailed in the ScoutFS whitepaper. The characteristics of the structure allow for fast search, fast ordered retrieval starting at any point, and high likelihood of accessing cached items already in memory. When a file is modified the associated inode is moved to the new position in the sequence with an updated sequence number. And likewise when the inode no longer exists in the filesystem, it will no longer be returned within the results from the query.
Using the query interface
The query interface for the metadata sequence can be thought of as a persistent and reliable inotify system across the entire file system. This allows the userspace application, ScoutAM, to retrieve the sequence of modifications to the filesystem. It can start anywhere in the sequence, so it can wait at the point where no more results are returned and just poll for future transactions. This allows the application to discover any work that it should accomplish as files are created or modified. In this example, we can keep a running count of the files created or modified.
A Go library for the Accelerated Query Interface (AQI)
The following section details how the Go library works for interacting with the ScoutFS query interface. If the inner workings aren’t your thing, skip to the next section on an example of the interface at work.
The query will give us walk inodes entries like this: