Making Downloads More Flexible in the SyncServer

9/15/17

I have just run into an interesting issue with SyncServer. When installing a new user (or an old user re-downloading their files), the Shared Images app can have a reasonably large number of files to download, say 100 or so. Or in general, there can be an arbitrary number of files for download. What if the app runs into a problem when downloading, and can’t download them all at once? Well, then the download fails. That is, I’ve been relying on an all-or-nothing mechanism. And this has really been part of the SyncServer itself so far, and not just an issue with the Shared Images app.

For example, consider these two callback methods on the iOS SyncServer client:

/* Called at the end of all downloads, on non-error conditions. Only called when there was at least one download.
The client owns the files referenced by the NSURL’s after this call completes. These files are temporary in the sense that they will not be backed up to iCloud, could be removed when the device or app is restarted, and should be moved to a more permanent location. This is received/called in an atomic manner: This reflects the current state of files on the server.
Client should replace their existing data with that from the files.
*/
func shouldSaveDownloads(downloads: [(downloadedFile: NSURL, downloadedFileAttributes: SyncAttributes)])

// Called when deletions have been received from the server. I.e., these files have been deleted on the server. This is received/called in an atomic manner: This reflects a snapshot state of files on the server. Clients should delete the files referenced by the SMSyncAttributes’s (i.e., the UUID’s).
func shouldDoDeletions(downloadDeletions:[SyncAttributes])

While there are events that are triggered for individual downloads and deletions, so far I’ve been using these individual-oriented events just for debugging, testing, or UI matters. The core part of SyncServer has relied on this atomic characteristic for downloading (or download deleting) a set of files.

Why is this? What is the purpose of this atomic characteristic for a set of files?

I have been trying to provide a simple consistent snapshot of the data on the server at a point in time. This directly relates to my concept of `master version` to date as well. The master version represents the state of the data on the server (for a user) at a point in time. It seems however, that this is too simplistic. While for many situations this atomic characteristic can be provided, for some, it cannot. Like the download situation above– downloading a relatively large number of files where for time sake, this data really has to be exposed to the client app. For example, in Shared Images, the user of the app wants to see the images as they are downloaded. And to be able to restart a download after (for example) a failure. Downloads should not have to be restarted from the beginning. Thus the data that we present to the client will not always be a consistent snap shot. It can be just part of a snapshot.

It seems we have to move more towards an eventually consistent model. In this approach, we keep trying to sync data between the client and server, and after a series of attempts, we should be able to present the app with the current state of the server data. I.e., the client and server will be consistent. The client app will have to be designed with this in mind: That the data being displayed to the user is not always consistent with that on the server. But of course, this will be true in general, independent of the current reasoning. There will be times when files have been modified, uploaded, or deleted on the server, and those files will not yet have made it to some specific client(s).

Practically, how will this impact the Shared Images app and the iOS SyncServer client?

I will need to change the `shouldSaveDownloads` and `shouldDoDeletions` protocol methods above so that they save or delete individual files as the download occurs. With deletions, since files cannot be undeleted, a deletion will always be consistent with the server. With downloads, a given file downloaded might be quickly deleted later (if say, someone else were to delete that file during a long-running download). The client will need to be designed to deal with this eventually consistent mode of operation. Similarly, once we get multi-version files, a file downloaded might have a new version that needs to be downloaded quickly after it’s initial download– possibly leading to conflicts. This individually-oriented operation will mean in the client that the DownloadTracker object for a given download will need to be deleted immediately after the download finishes.

Do uploads from the client have the same kind of characteristics? Say the client has been disconnected from the network for a long period of time, and has accumulated a lot of files that need to be uploaded. Those uploads, once connected to the network could take a long period of time. A given successful upload should not have to be repeated if another upload fails. More specifically, suppose that N uploads were successfully done and then an upload fails. Those N successful uploads should not have to be repeated unless a different version of the file (when we allow this) was uploaded by another user. (Deletion of the file by another user would not have us upload the file again). We might need to, of course, need to re-download some files after an upload failure. But we should be able to, after handling any conflicts, continue with our upload. However, while uploads do similar characteristics in terms of individual file orientation, there is no specific use case demanding modification for uploads right now. So, I’ll delay changing upload operation at least for the present.

What about the server itself? Should we still have the server refuse to complete an upload or a download if the master version given by the client is out of date? Consider a download– we could be downloading a file that has either been deleted (or, eventually) is the wrong version. Consider an upload– we could be uploading a file that already been changed or deleted. It does seem that the current operation with the master version should be maintained. Otherwise, we’ll be doing work that might be a waste of time. A change in master version will still cause a failure in a download (or upload) and will require that the download (or upload) be restarted– but after our current changes, with downloads, we’ll be able restart from more or less where we were stopped last time.

My thanks to Dany Ligas and Rod Thomas for their help in testing Shared Images and SyncServer.