A problem I’ve been looking at has just transitioned from a concern about the hosting requirements for the SMSyncServer server (just sync server in the following) into a architectural issue with how I am dealing with the file-system on the sync server. The problem is as follows. The sync server is a middle-man server, enabling synchronization of files between client mobile devices and user cloud storage (e.g., Google Drive). In order to provide support for atomic transactions, files (e.g., on upload) are currently first stored in the sync server, and then transferred to user cloud storage. The files stored on the sync server are temporary– once they have been transferred to user cloud storage they are removed from the sync server. However, the duration of this temporary storage can be arbitrarily long. For a collection of large files being uploaded from a client app, the files will generally need to persist for longer than a collection of small files. Plus, if the client app loses network connectivity, it may not complete the sync operation in a timely manner, and the files will have to be persisted for longer on the sync server.
And this is where the current architectural problem comes in: To date, I’ve literally been storing these temporary files in the file-system on the sync server. My initial testing was carried out by running the server on my local machine, and since this was not a production environment and had just a single instance of the server, I ran into no problems.
I started thinking about a potential problem after I moved my testing onto an Internet hosted server– I’ve been using Heroku. For scalability purposes, Heroku runs server instances in “dyno’s”, and Heroku dyno’s have an ephemeral local file-system:
Each dyno gets its own ephemeral filesystem, with a fresh copy of the most recently deployed code. During the dyno’s lifetime its running processes can use the filesystem as a temporary scratchpad, but no files that are written are visible to processes in any other dyno and any files written will be discarded the moment the dyno is stopped or restarted. For example, this occurs any time a dyno is replaced due to application deployment and approximately once a day as part of normal dyno management.
My first thought on seeing the ephemeral character of Heroku’s file system was that I should find a hosting service that had a persistent file system. To this end, I read about OpenShift‘s persistent file system:
5. Persistent Access to Secondary Storage
The location of this persistent storage area is provided via the
OPENSHIFT_DATA_DIRenvironment variable. This a a great place to stash user-generated content, or other persistent runtime information that should be kept through reboots and deploys (without being checked in to your project’s source tree).
On OpenShift Onine, you can pay for additional storage space. OpenShift Origin and OpenShift Enterprise both allow disk resource allocations to be easily configured per container type. These resource allocation plans are selectable during the app creation process as a “gear size”.
Aha, I thought. It will be simple. I’ll just start using OpenShift. Problem solved. Hmmm. Not so quick. Further reading shows the flaw in this reasoning:
So, yes, I could rely on OpenShift. But, if I want to provide the possibility of scaling the sync server, this is not the end of the story. And, of course, this is really the central feature of systems like OpenShift and Heroku– scaling your system to support larger loads.
The Solution: Re-Architecting the SMSyncServer File System
The solution to this problem lies in a re-design to part of the sync server– I have been relying on a mistaken assumption that I could use the sync server local file system to write temporary files and (a) those files would persist for as long as needed, and (b) those files would be accessible across server instances.
The following diagram illustrates the current file-system architecture in use with the sync server. What is important here is that with scaling, the sync server would have multiple instances, and each file upload could potentially access a different server instance. However, the sync server API call used from the client device would assume it could access all of the uploaded files, in order to initiate their transfer to the cloud storage server, on the same file system. Which would not occur, with scaling, in this current architecture. Thus breaking the system!
Option 1: Amazon S3
In this re-architecting, one option is to adopt the solution suggested by both Heroku and OpenShift:
Any files that require permanence should be written to S3, or a similar durable store. S3 is preferred as Heroku runs on AWS and S3 offers some performance advantages.
An inevitable outcome of using Amazon S3 to store these temporary files is that developers using SMSyncServer would need to have an additional set of account credentials and fee structure. I.e., they’d have to pay both for hosting the SMSyncServer server, and using Amazon S3.
The following diagram illustrates this option of storing files in Amazon S3. Because of this use of a distributed system for file storage, even when the sync server is distributed across multiple server instances, the temporary files could be accessed across all of those instances. The sync server would manage the uploads from the client to S3, and the transfers from S3 to the users cloud storage service. Note that the uploads from client to S3 and transfers from S3 to user cloud storage could take place using streams– e.g., multer-s3, by-passing the file system in the sync server entirely.
Option 2: MongoDB/GridFS
An alternative to the above Amazon S3 storage option is to use MongoDB/GridFS. Since SMSyncServer already makes use of MongoDB for storing meta data and locks, there would be no requirement for additional credentials or fees on a different cloud system. Files smaller than 16MB could be stored in MongoDB directly, while GridFS could be used to store larger files. GridFS allows for streaming of files. Busboy appears to allow for streaming uploads.
The structure of this architecture would be similar to that illustrated above:
In terms of overall SMSyncServer project development priorities, the current discussion is relevant to server scaling. While scaling is important, completing all basic features for the system (e.g., conflict management) have higher priority.
About the author: Christopher G. Prince has his B.Sc. in computer science (University of Victoria, B.C., Canada), an M.A. in animal psychology (University of Hawaii, Manoa), an M.S. in computer science (University of Louisiana, Lafayette, USA), and a Ph.D. in computer science (University of Louisiana, Lafayette, USA). His M.S. and Ph.D., while officially in computer science, were unofficially in cognitive science, split between animal psychology and computer science. Chris is a dedicated animal person, and has also developed: Catsy Caty Toy, a customizable and shareable iPhone and iPad app for your cats (http://GetCatsy.com), Petunia, an app for recording and sharing pet health information (http://GetPetunia.com), and WhatDidILike, an iPhone app to keep track of restaurants and food that you like (http://WhatDidILike.com).
“Re-Architecting the SMSyncServer File System” by Christopher G. Prince is licensed under a Creative Commons Attribution 4.0 International License. Permissions beyond the scope of this license may be available at chris@SpasticMuffin.biz.