Architecture
Understanding Regatta Storage
Overview
Regatta Storage transforms S3 buckets into high-performance local storage with centralized caching and data management. Regatta is a fully-managed, durable, high-speed caching layer that sits between your EC2 instance and your existing S3 bucket. You don’t need to deploy any infrastructure or provision capacity to use Regatta. Regatta’s cache automatically expands with the working set size of your application, and you’re only billed for what you use.
Your instance connects to Regatta using encrypted NFSv3, and Regatta translates file-based operations into S3 API calls against your bucket. Regatta automatically stages writes in the durable cache provide low-latency, and asynchronously completes those writes in S3.
Core Components
Centralized Cache
Regatta maintains a centralized, durable, shared cache for frequently accessed data and recently written data. This cache automatically expands with the working set size of your application, and Regatta intelligently caches data based on your access patterns. The Regatta cache is designed to provide sub-millisecond latency for cached reads and writes. Because the Regatta cache is shared across all clients, you can achie low-latency access to reference data from multiple instances simultaneously.
POSIX File Semantics
Regatta’s durable caching layer allows it to efficiently perform file-based operations, and asynchronously persist writes to S3. Regatta’s POSIX-compliant file system interface is strongly consistent for all connected file clients, and supports all standard POSIX file operations (including renames, appends, file locks, and symbolic links).
Your EC2 instances connect to Regatta using NFSv3. Regatta’s managed, high-speed caching layer runs in our cloud accounts, and synchronizes to the specified S3 bucket in your AWS, GCS, or Cloudflare account. You don’t need to deploy any infrastructure or provision capacity.
How It Works
File System Mount
You can mount a file system using the Regatta mount helper. This performs IAM authentication on the mounting instance, establishes an encrypted NFSv3 connection between your EC2 instance and Regatta’s caching layer.
Read Operations
When reading a file, Regatta attempts to serve data from the centralized cache. If the data is not present in the cache, Regatta fetches the backing data from S3 and updates the cache. For each read request, Regatta may intelligently cache additional data based on your access patterns. Regatta also caches file metadata, including directory contents, to minimize the number of S3 API calls that you have to pay for.
Write Operations
When performing mutating operations on the file system (including writes, renames, and directory changes), Regatta first stages this data on its high-speed caching layer to provide strong consistency to other file clients. Regatta will then asynchronously persist these changes to the backing S3 bucket, generally within 5 minutes. If Regatta detects multiple write operations to the same backing S3 object, it will batch these operations into a single upload to minimize requests and reduce costs.
Regatta’s high-speed cache automatically expands to the working set size of your application. By default, Regatta will keep data that has been read or written in the cache for up to 1 hour.
Security
Regatta automatically encrypts all data in transit and at rest by tunneling your instance’s NFSv3 traffic through a TLS connection.
Regatta uses AWS IAM to authorize access to your file system. You can specify which IAM principals have access to the file system in the Regatta web console. On mount, the Regatta client generates a pre-signed AWS STS GetCallerIdentity request, which is used to authenticate the client to the file system.
Data Durability
Regatta is designed to provide 99.999% (5 nines) durability for newly written data before it is synchronized to S3, and 99.99% (4 nines) of file system availability.
Regatta automatically stores data redundantly within a single AWS Availability Zone. We are actively working on support for file systems which redundantly storage data across multiple Availability Zones for the highest data durability. If you need multiple Availability Zone durability, please let us know at support@regattastorage.com.
Write Consistency
The Regatta cache provides strongly consistent, read-after-write atomic POSIX operations for all connected file system clients.
Regatta is generally able to synchronize changes between the file system view and S3 within minutes. Regatta allows you to concurrently access your data from both the file system interface and S3.
For best results, we don’t recommend concurrently editing individual files from both interfaces. For example, if you write to the middle of a file using Regatta and simultaneously call PutObject on the same object via the S3 API, Regatta does not guarantee which edit will be reflected in S3.
Similarly, if you perform operations which are not atomic in the S3 API (such as a directory rename), you may observe partial results of the operations in your S3 bucket while Regatta synchronizes the changes. You may experience inconsistent results if you add or remove files from the directory using the S3 API during this synchronization period.
We recommend segmenting your usage of the file system for writes. For example, structure your application so that PutObject calls which ingest data go to one prefix (for example, ‘input/’), while newly created files on Regatta go to a different directory (for example, ‘output/’). This way, your application never needs to simultaneously edit the same file from both interfaces.
Comparison to Other Solutions
There are many different solutions which provide file-like interfaces on top of S3 buckets. However, only Regatta provides an interface which is fully POSIX-compatible, stores data in the native format in S3, does not require capacity provisioning, and provides shared cache benefits to multiple instances.
FUSE Adapters
Existing FUSE adapters, such as S3FS and Mountpoint for Amazon S3, are not designed to support full POSIX semantics. For example, Mountpoint for Amazon S3 does not support file locks, random writes, renames, or symbolic links. In order to use these tools, you need to instrument your file applications to understand whether or not they use these unsupported operations, which can be difficult and error-prone. In addition, these tools do not provide shared caching for reference data sets to multiple clients, so large clusters may need to re-download the same data from S3 repeatedly. In comparison, Regatta is designed to support all standard POSIX file operations, which means that it’s out-of-the-box compatible with all file applications.
Block-format Storage
Other file system adapters for S3, such as JuiceFS or ObjectiveFS, require that you store data in your S3 bucket in a proprietary block format. This means that you cannot use your existing data sets in S3 with these tools, and these buckets aren’t compatible with any existing S3 workflows that you use. Regatta stores data in S3 in its native format, so you can simultaneously use and share the data from both a file system and S3. You can also use Regatta to connect to existing data sets in S3.
Enterprise Storage
Some solutions, like Alluxio or FSx for Lustre, require that you either deploy and manage your own infrastructure or that you provision capacity. It’s often difficult to predict the specific storage needs of an application, especially if the application can have unexpected or bursty needs. Regatta is a fully managed, serverless solution that automatically scales with your application. You don’t need to provision capacity or deploy infrastructure, and you’re only billed for what you use.