Syncing stops between some devices after correct initial behaviour, but data on some devices and ROS is still in sync


I have a project that is nearing launch and as a result we are starting to with with more real world, user based data. Recently, we have started to noticed problem with syncing between multiple devices for a swift based project. I will try to describe what I am seeing and then if there is additional data I can provide, I will be happy to do so. The problem I describe below is for development builds created in Xcode, but we are seeing the same issue for alpha version that are running on internal devices. Also these are full synced realms and the user is connecting to one shared and one user only realm.

It starts with a version of the app running on either a simulator device or on a real device. The use syncs all of their data from the ROS and all works as it should. The user can then connect a second device and after syncing, if the user adds or deletes data, the two devices are kept in sync. I can close one device, make changes on the other, open the first and everything syncs. Basically, things are working exactly as I think they would.

For about a day, I had two simulator devices working correctly and taking to each other via the ROS. I closed both of those devices down by quitting the simulator on my Mac. I then, fired up a fresh install of my app on iPhone, logged in and synced all the data. I then left the house and while data was enabled or not enabled, I made changes to the data on the app on my iPhone, mostly deleting records, but also adding a few new data items. A few hours later, I came back home and fired up the simulator based devices, waited and they did not sync to the recent changes. I checked things via Realm Studio and the data on the ROS matches what I have on my phone. I can delete items on the phone or in Realm Studio and the two devices are perfectly in sync.

Unfortunately, the two simulator devices just sit there and don’t sync. Any attempt to manipulate data on either device is ether very slow or causes things to crash. I have SyncManager logging set to all and see the following in the console before the CPU spikes to 100% on a background thread and just sits there. There is obviously a lot more logging, but nothing that looks like an error

2019-03-30 19:44:10.348364+0100 Logbook[68479:4466181] Sync: Connection[1]: Session[1]: Finished changeset indexing (incoming: 13 changeset(s) / 279714 instructions, local: 15 changeset(s) / 280053 instructions, conflict group(s): 354)
2019-03-30 19:44:10.348454+0100 Logbook[68479:4466181] Sync: Connection[1]: Session[1]: Transforming local changeset [1/15] through 13 incoming changeset(s) with 354 conflict group(s)
2019-03-30 19:44:10.348650+0100 Logbook[68479:4466181] Sync: Connection[1]: Session[1]: Transforming local changeset [2/15] through 13 incoming changeset(s) with 354 conflict group

On the ROS itself, with logging set to all, I see the following as I fire up the problematic client

[5236]: Connection to sync server established. src= dst= - GET /health?thisInstance=true HTTP/1.1 200 28 - 0.491 ms - GET /health HTTP/1.1 200 28 - 2.626 ms - POST /auth HTTP/1.1 200 794 - 3.922 ms
[5235]: Connection to sync server closed.
[5235]: Closed connection to client
The Realm Object Server proxy has 4 open connections to the ‘default’ sync server
Open files in Realm Object Server: {“regular”:32,“directory”:0,“character-device”:1,“block-device”:0,“pipe”:18,“symbolic-link”:0,“socket”:18,“unknown”:4,“total”:73}

I am honestly, unsure where to even being troubleshooting this. When things work, they are brilliant, but this fragility in the syncing process is a huge issue for us right now. Gut feeling is that there is something about the data we are saving as this only started to be come an issue when we started loading larger amounts of data for a test user. We also recently upgraded to the newest version of the Relam Swift (it was a problem before the upgrade) and that seemed to solve things for a bit, however as before the problem that reared its head again.

As I said at the beginning, happy to provide more information, I am just not sure what would be useful at this point. The only thing that is jumping out at me is the “354 conflict group”, but I am unsure what that denotes. If there are conflicts, I don’t know where they could be coming from as no data was changed on either of the simulator devices in between when the were working and when there were now.