Mobile Database Bi-Directional Synchronization with a REST API

Mobile device’s don’t always have persistent connections to the internet, or local networks. Because of this, some apps will need to provide offline access, so that users or employees, can continue to work uninterrupted. To do this, we need local data storage, commonly SQLite, and a way to update the server with changes and receive new changes, while we were offline. This can become complex, very quickly. There are existing libraries out there for offline capabilities, such as Azure Mobile Client, but they normally end up too restrictive, and is not designed for complex situations. However, once you see what needs to be done, you might understand why it is so hard to get right.

To provide offline access we need 2-way bi-directional synchronization, and it is the hardest type of synchronization to implement, if you have multiple clients working on the same records. If you have an app, where users submit comments, or update/edit their own content, this isn’t as difficult, as it’s single user. What we are going through here, is bi-directional (2-way) synchronization, on records that could be updated via multiple clients, at any point in time. But this process works for either scenario, just described.

The sample code I use in this post, comes from my GitHub repo BiDirectionalSync.

Synchronization Issues

When you need to provide synchronization, there are many issues that can present. If you have control of the API, and this is a greenfield project, you are in luck. If not, then there are workarounds. Here are some of the issues you need to be aware of.

  • What ID to use, when generating data offline, to avoid conflicting ID’s on the server and client.
  • Recording when data was updated or deleted on the client.
  • Knowing when the server has updated or deleted data.
  • Knowing when you are offline.
  • Knowing which data to update or remove when synchronizing.
  • Timing issues.

Ideal Scenario

Let’s say you have control of your entire environment, database design, API design and so forth. Here is how you can setup your server data store and API. First lets look at the Mobile Client setup and how it connects to the server.

The point to note here, is your app should know nothing, of the synchronization service or API. It makes it incredibly easy then to shield your app from changes to the API and synchronization processes. The synchronization service will be the only service that even knows about your API, and your App Service can trigger a sync, either at intervals, or when each row is updated. That is up to you on the frequency required.

Server Database

Primary Key

Use a GUID as your Primary Key. Before you cry, won’t somebody please think of the performance, make sure that your clustered index is a separate column, on a int value. This will relieve any performance hit, and provide you with a GUID primary key, which is fantastic for synchronization. If you want to know more about the clustered index, please take a look at my other post Creating SQL Azure Database, and look at the Clustered Index section.

Deletion Tracking

You want to add a deleted column to each table you are synchronizing. It’s the quickest and easiest way to track if a row has been deleted. Otherwise you would have to send all rows to the server and compare which ones are no longer there anymore.

Last Updated Tracking

Keeping this simple, you will want a LastUpdated column in your table, to track the last time the server updated it. Also, you will want a ClientLastUpdated timestamp, to mark, when the client recorded the data. This can later be used to overwrite, if a client recorded data at a later stage, but synchronized last. For example, if a client record has their mobile phone number updated, then they come back a few minutes later and update again because they gave you the wrong one, but the data was entered by different people on 2 different devices, this can cause an issue.

If the second (correct) time they updated, is synchronized before the first (incorrect) one, then when the incorrect one is synchronized, it will overwrite with the incorrect value. This is why tracking ClientLastUpdated, is important.

It is also important to either, have the DateTime as a UTC value, or use DateTimeOffset, if you care about the timezone it was created it.

Database Lock

Ensure you lock your database row from reads and updates, while you read it and compare it, during your determination of the sync.

BEGIN TRAN SELECT * FROM tablename WITH (HOLDLOCK, ROWLOCK)
WHERE ID = 1

// Get Value, compare values in code, then do an update if needed.

COMMIT TRAN

This is here to ensure that 2 clients synchronizing at the same time don’t both read, then update simultaneously.

Synchronization Logic

Before we delve into the details of the synchronization logic, here is an overview of it.

The App Service will be updating data, directly to the local client database. When a sync happens, it is best to lock all updates to the database, to avoid many potential issues.

Client Sync

Note: Please be aware all this sample code, is using an in-memory storage, and is just here to show the logic. You will need to store certain information, such as lastSync, in a database table or other permanent storage.

First we want to push all our changes to the server. This involves getting all data that has changed since our last sync.

// I keep a record of the last sync time.
private DateTimeOffset _lastSync = DateTimeOffset.MinValue;

// This is just my sample, in-memory data storage
private IList<ClientTableSchema> _rows = new List<ClientTableSchema>();

public void Sync()
{
    // All rows that have changed since last sync 
    var changed = _rows.Where(x => x.LastUpdated >= _lastSync || (x.Deleted != null && x.Deleted >= _lastSync)).ToList(); 
    _server.PushSync(changed.Cast<TableSchema>().ToList());
}

Next we want to pull all changes from the server, and update our local store. At this level, it’s basically a server wins all, because the server has dealt with the synchronization issues, and is now telling you what needs to be updated.

public void Sync() 
{ 
    // ... as above ...

    // Pull sync is just getting all records that have changed since that date.
    foreach (var row in _server.PullSync(_lastSync)) 
        if (!_rows.Any(x => x.Id == row.Id)) // Does not exist, hence insert 
            InsertRow(new ClientTableSchema(row)); 
        else if (row.Deleted.HasValue) 
            DeleteRow(row.Id); 
        else 
            UpdateRow(new ClientTableSchema(row));

    _lastSync = DateTimeOffset.Now; 
}

Server Sync

A REST API is stateless, and hence isn’t actually aware of synchronization. The Pull and Push as actually rather simple initially.

public IList<TableSchema> PullSync(DateTimeOffset? since = null) 
{ 
    if (since == null) 
        return _rows; 
    else
    { 
        var list = _rows.Where(x => x.LastUpdated >= since.Value || (x.Deleted != null && x.Deleted > since.Value)).ToList(); 
        return list; 
    } 
}

public void PushSync(IList<TableSchema> rows) 
{
    foreach (var row in rows) 
        if (!_rows.Any(x => x.Id == row.Id)) 
            Insert(row); 
        else 
            Update(row);              
}

However, lets look at the Update function, which is where the synchronization logic actually is.

private void Update(TableSchema row) 
{ 
    if (row.Deleted.HasValue) 
    { 
        var dbRow = _rows.Single(x => x.Id == row.Id);
        dbRow.Deleted = row.Deleted; 
        dbRow.LastUpdated = row.LastUpdated; 
    } 
    else 
    { 
        var dbRow = _rows.Single(x => x.Id == row.Id);
        if (dbRow.ClientLastUpdated > row.LastUpdated) 
        { 
             // Conflict  
             // Here you can just do a server, or client wins scenario, on a whole row basis.  
             // E.g take the servers word or the clients word

             // e.g. Server - wins - Ignore changes and just update time. 
             dbRow.LastUpdated = DateTimeOffset.Now; 
             dbRow.ClientLastUpdated = row.LastUpdated; 
        }     
        else // Client is new than server
        { 
            dbRow.Name = row.Name; 
            dbRow.Description = row.Description; 
            dbRow.LastUpdated = DateTimeOffset.Now; 
            dbRow.ClientLastUpdated = row.LastUpdated;
        } 
    } 
}

Differential Sync

This sync, acts on the assumption that the client, will send only that which it has updated, to the server, in a PUT request. On the client we need to do one additional thing, and that is when the row is modified on the client side, keep a serialized copy of the row, as an original.

public void UpdateRow(ClientTableSchema row)
{
	var dbRow = _rows.Single(x => x.Id == row.Id);

	if (string.IsNullOrEmpty(dbRow.Original))
		dbRow.Original = JsonConvert.SerializeObject(row);

	dbRow.Name = row.Name;
	dbRow.Description = row.Description;
	dbRow.LastUpdated = DateTimeOffset.Now;
}

I just Json.NET for serialization. Then when you do a differential sync, check what has changed from the original.

public void DifferentialSync()
{
	var changed = _rows.Where(x => x.LastUpdated >= _lastSync || (x.Deleted != null && x.Deleted >= _lastSync)).ToList();

	var list = new List();
	foreach (var item in changed)
	{
		var original = JsonConvert.DeserializeObject(item.Original);

		// A better way to implement this is needed. It only detects the first column changed.
		// This would need to group them all together and send them.
		// But again, this is a sample, I can't do all the work for you :)
		// At the moment this will only work with Name or Description

		if (original.Name != item.Name)
			list.Add(JsonConvert.SerializeObject(new { Id = item.Id, LastUpdated=item.LastUpdated, Name = item.Name }));
		else if (original.Description != item.Description)
			list.Add(JsonConvert.SerializeObject(new { Id = item.Id, LastUpdated = item.LastUpdated, Description = item.Description }));

		item.Original = null; // Clear original as now sync'd
	}

	_server.DifferentialSync(list);

	foreach (var row in _server.PullSync(_lastSync))
		if (!_rows.Any(x => x.Id == row.Id)) // Does not exist, hence insert
			InsertRow(new ClientTableSchema(row));
		else if (row.Deleted.HasValue)
			DeleteRow(row.Id);
		else
			UpdateRow(new ClientTableSchema(row));

	_lastSync = DateTimeOffset.Now;
}

Now on the server side when we see these changes, we determine what has actually changed, and we only update those columns, leaving everything else as is. This means when we do the pull afterwards, it will bring down all the changes made, even by other clients.

/// <summary> 
/// This is simulating a put request 
/// </summary> 
/// <param name="jsonUpdate"></param> 
public void DifferentialSync(List<string> jsonUpdate) 
{ 
    foreach (var item in jsonUpdate) 
    { 
        dynamic update = JsonConvert.DeserializeObject(item);
        var lastUpdated = (DateTimeOffset)update.LastUpdated; 
        var id = (Guid)update.Id; 
        var dbRow = _rows.Single(x => x.Id == id);
        if (((JObject)update)["Name"] != null) 
            dbRow.Name = update.Name;
        if (((JObject)update)["Description"] != null) 
            dbRow.Description = update.Description;

        dbRow.LastUpdated = DateTimeOffset.Now; 
        dbRow.ClientLastUpdated = lastUpdated; 

        // Update Row into DB (because its memory based in this sample, its already done)
    } 
}

Please note that this code is just a sample, and is not production quality code. You can even expand this further, by tracking which client, has updated each column in the row, to then perform who wins each column update.

Transactional Log Method

One additional way you could look at doing the sync, is to store each change you make, into a separate table, similar to a transaction log for the database. Then send, this log to the server, have the server process it. Then do a pull to receive all the changes. This method, will help keep your DB schema cleaner, but may involve more work in creating a format to store these transactions that the server can process. I would personally prefer this method, in larger and more complex databases, where you want column based differential synchronization.

  1. For each insert or update, store only the data changed in a serialized JSON object.
  2. Ensure each table name, DateTime for when it changed, and what action was taken, update or insert.
  3. Send this in a bulk update to the server
  4. The server will then process each update, according to its own rules.

This method has the advantage of determining the exact change of each column, then the server being able to determine changes exactly as they happened within each client synchronizing.

Issues

There are some additional issues, that you will face after you have implemented the above.

Storage

When tracking changes, you need to keep more information. Using the differential sync above, for example, could easily double your storage requirements.

Timing

The mobile device, could be minutes or even worse, hours outside of what the real time is. In order to resolve this, I would require that they send their current DateTime, in the header of each request. In doing so, assuming a latency range of 100ms – 500ms, we can check the time and see if it’s within bounds. If the time is very far outside, e.g. more than 2 seconds, you can then find out the difference and shift the dates they sent in their updates to an appropriate time.

It’s never going to be perfect, and this is an approach generally used within a business environment. In a consumer environment where the end device is largely uncontrolled and anyone can hit your API, you many need to rethink your approach in synchronization.

Workarounds

If you don’t have control of the API, and/or server database, then you may need to implement various workarounds to help with synchronization.

Int as Primary Key

Many databases have an int as a primary, if this is the case, I normally add another column locally in the client, called LocalId. You then need to create this ID for each record you add locally. It’s only relevant to the

No Synchronization Aware API

If your REST API, is just a plain REST API, with no knowledge or logic to handle multiple client synchronization, then you may need to shift this logic to your client app. This is normally done by doing a Pull first, detecting changes, resolving them locally, then pushing your finalized changes.

Summary

As you have read, synchronization is tricky business. You need to ensure you provide the appropriate locks in numerous places, to avoid those very hard to track synchronization issues.

Server

  • GUID as primary key, and separate int as ClusteredIndex in your database.
  • Add Deleted, ClientLastUpdated and ServerLastUpdated columns to each table.
  • Ensure you implement the appropriate database/row lock when updating.

Client

  • Client App, should not know of synchronization process.
  • Add Transaction Log Table, or Original Column to each table, to keep synchronization logic
  • Add a Deleted and LastUpdated column, if they don’t already exist.

Of course, the best solution for bi-directional synchronization, is not to require it. If you have that option take it, otherwise, I hope this has provided you with a workable solution.

Learn More

If you need even greater synchronization, down to the actual data in each column or value, you will want to look at Operational transformation or Differential Synchronization using Deltas. Though these are more geared to document synchronization, and unlikely you will need to use them in a relational database situation.

Microsoft MVP | Xamarin MVP | Xamarin Certified Developer |
Exrin MVVM Framework | Xamarin Forms Developer | Melbourne, Australia
Tagged:Tags:

12 Comments

  1. Shimmy Weitzhandler

    Great! How about graph management?
    Do we need to take care about each record individually?
    Can I create/update/delete a Customer along with all his phones, addresses, orders and order details?

    1. Adam Pedley

      This is solved by making sure you have soft deletes, meaning you use the Deleted flag to delete rows and not actually hard delete them. If you have structured your database correctly, even if tables referencing a deleted row in another table still exist, the app won’t see them.

      This way you can update/insert/delete any row on any table as needed, without worrying about cascading. Which reminds me, make sure that on your Server, don’t enable Cascade deletion (it’s not on by default), and don’t hard delete anything.

      1. Shimmy

        Hey Adam and thanks for your reply!

        Sometimes hard deletes are required.
        Consider this example

        abstract class SignedDoc { }
        class PoliceSignedDoc : SignedDoc { }
        class SecuritySignedDoc : SignedDoc { }

        abstract class Report {
        public SignedDoc Doc { get; set; }
        }
        class DispatcherReport : Report
        {
        public PoliceSignedDoc PoliceDoc => (PoliceSignedDoc)Doc;
        }
        class ResponderReport : Report
        {
        public SecuritySignedDoc SecurityDoc => (SecuritySignedDoc)Doc;
        }

        class Incident {
        public Report Report { get; set; }
        }

        In the above example, an `Incident` may have its report changed to a different type of report. In this case I want the old report along with its dependent entities (docs) be hard deleted, and it makes no sense to use soft deletes here.

        Before I’m adopting Azure tables, I just want to make sure it covers complex scenarios I often bump into.

        Anyway, let’s leave the deletes aside and focus just on updating, how hard is it to save an existing graph with an `Incident` as its root with changes to its `Report` and `Doc` as well as other dependent many to many or whatever related entities in the graph.
        Additionally, I’m curious how azure tables deals with TPT scenarios.

        1. Adam Pedley

          Soft deletes are required, unless you use the transactional log approach and send through a delete request to a server that understands it.

          Just to also clarify, this blog post is not about using Azure Mobile Client, it’s about rolling your own solution.

          Any sync solution, generally works on recording or noting which row or column changes. So as your local DB is updated, it notes these changes with a LastUpdated flag, or transaction log, or version number, depending upon which solution your choose. When these are sent to a server, it updates them row by row.

          The sync process isn’t aware of graphs, just individual rows.

  2. Eric Brunner

    I work with Azure Mobile Client SDK Offline Sync for one year. Once you are not on an experimental trail I could only recommend using version 3.1.0 of https://www.nuget.org/packages/Microsoft.Azure.Mobile.Client/3.1.0 and App Database https://www.nuget.org/packages/Microsoft.Azure.Mobile.Client.SQLiteStore/3.1.0

    Dont take version 4.x (there are issues)!

    Turorial: https://adrianhall.github.io/develop-mobile-apps-with-csharp-and-azure/chapter3/dataconcepts/

    Offline Sync is the hardest part of implementation, so I would not experiment with it unless you have much time 🙂

    That works for production apps.

    1. Adam Pedley

      Thanks for the additional info. I haven’t dealt with Azure Mobile Client offline data sync for a few versions. From a brief read, it seems as though they have updated it, to do multiple row level conflict detection.

      It still doesn’t allow deep fine tuning of the sync process, which is where I keep stumbling down on these solutions. And it’s certainly interesting to go through the details of the sync process, to truly understand what it has to do.

  3. Eric Brunner

    I really appreciate your effort in providing that kind of offline sync solution. If time left I will try it in a testrun and make some feature comparison and benchmark tests.

    1. Adam Pedley

      Personally after going through the whole process, I think the transaction log approach, which isn’t the one I coded, may be the best solution overall in terms of flexibility and performance. I am tempted to code that approach and see if a plugin can be made, but without just replicating what Azure Mobile Client did.

  4. Rafał

    Great article! I have a question, maybe a little off topic, but anyway: what would you suggest to handle large data sets downloading/uploading (for example 100MB)? Is there any solution or library that helps solve such problem?

    1. Adam Pedley

      At such a large dataset, I would just get it all, compress it, then send it as one call. If you often have 100MB+ datasets you need to send over the wire, each time. e.g. if a few rows are added but contain vast amounts of data.

      If you just have a 100MB database, but want to sync it, then you could use the approach as discussed in the article, as the actual changes to the dataset would be far less, and you only send the changes across.

      1. Rafał

        I am using similar sync approach, but only after first “large” synchronization. However first synchronization may take to much RAM, when downloading even bigger data and deserializing it. So I am looking for solution which allows to download and deserialize data partially.

Leave A Comment?