Is Realm Cloud Down Again?


#1

I can no longer connect to Realm Cloud from my app nor from Realm Studio. In Realm Studio I click Connect to Realm Cloud and a blank screen comes up with my instance name at the top; no tabs for Realms, Users nor Logs…

Ping cloud.realm.io times out.


#2

Seems like all instances are healthy and reachable on our end. Can you share your instance name so I can dig deeper into the logs?


#3

Instance name sent by PM to @nirinchev

EDIT: Addional info for checking logs: our instance was doing an initial sync to a device. Sync stopped incomplete at 1 April 0904 UTC

This is the second time our instance has disappeared. We still haven’t had an explanation since it disappeared last time.


#4

We did react to alerts on a couple of our customers over the weekend. From the instance name you have to @nirinchev, we determined that yours is one of them. Our workaround was to appropriate more memory and CPU to your instance, due to the load it was experiencing. It should be functioning normally since 2018-04-01T09:21 UTC.


#5

Just curious, could you estimate when this will be properly fixed? Because the last message sounds like you have problems handling higher loads and that there’s no proper fix for it yet.


#6

It hasn’t been working at all @alebsack even now. We still can’t connect to any realm on our server instance and as the image below shows we see nothing in Realm Studio; no realms, no users, no logs

EDIT: Our Realm Cloud instance has reappeared after being unavailable since 1 April. There are two log entries only
2018-04-01T09:22:10.503Z Realm Object Server has started
and an authentication error at 2018-04-01T09:22:10.568Z

There were many attempts to authenticate since, but no logs. Am I correct in thinking our instance was destroyed on 1 April and has only just been restored from backup?


#7

Our instance has been down for the last 6 hours. What is the procedure for dealing with these sorts of outages?


#8

@danielsegall Can you please open a ticket at support.realm.io with your instance name please


#9

It’s worth knowing that there are no general Cloud Down issues that affect everyone. We are hosting individual ROS servers for each “Instance”. We have also limited resource usage (CPU, memory and disk) for every free instance, to ensure sufficient capacity for everyone and to ensure excessive use wont affect other instances. We did that after experiencing some heaving usage/testing on some instances which caused the autoscaling of resources to affect other instances. You might have experienced issues before this protection was added.

So individual instances can still experience issues if you are scaling up testing very quickly as we are limiting the capacity for each instance, and only manually adjusting that based on individual needs. We do get alerted when an instance is close to reaching resource limits and usually just add more capacity - but as that is manual, you may see a delay.

Hope this help clarify a bit? Otherwise let me know.


#10

Thanks for the clarification @bmunk. That puts us in a bit of a bind. We are a small ISV with only a few hundred users. But those users depend on us for their core business software. Some of them have been with us for 20 years. Users put their trust in us. We can’t put their businesses at risk by moving them to a new platform without thorough testing. Realm Cloud looked good until we scaled up testing to a single realistic data set, then it went Kaboom and we were offline for 3 or 4 days.

Unless our instance is allowed to scale during beta, how do we satisfy ourselves that Realm Cloud is going to be up to the task of managing a few hundred of these data sets with users banging on them simultaneously?


#11

We’re in a similar boat as @Nosl , this post definitely reflects our feelings as well. Any further information would definitely be appreciated. I’d also point out that as far as I can see all instances are ‘free’ in their first month of life and only become otherwise after billing is added after that period. So it seems that if one deploys during the first month you are not deploying with the same level of backend robustness that you would be afterwards?


#12

Surely get that. We just handle that by allocating you sufficient resources to handle your testing load. Let’s just have a chat about you needs and we can allocate appropriately. Please reach out to our support by creating a ticket at support.realm.io.

To answer one of your questions above, we didn’t restore from backup, but fixed your problem and restarted the ROS instance. But logging won’t be preserved across restarts, so that’s likely why you don’t see those auth attempts.


#13

@danielsegall Same to you please just reach out to us via support.realm.io and we will get your needs covered.


#14

@bmunk just to clarify that we did get the server back up quickly after submitting a support request, so many thanks for the fast turnaround. But was I right about my assumption that the ‘free period’ has different resource allocation to the billing period?


#15

Thanks @bmunk but it I’m still a bit concerned that our instance was out of action for 3-4 days. It wasn’t just slow due to being throttled by lack of resources. It does seem something went unexpectedly awry on Realm Cloud that necessitated the restart. Are you able to share in general terms what happened and what mitigation is being planned?

We are waiting for a couple of things that Realm have in the pipeline for Realm Cloud before we release a beta of the Realm Cloud version of our app. We can’t know what the demand for the beta will be. We are concerned that if an issue like this was to occur during beta, it could shake our users’ faith in the direction we are taking.

EDIT: Our instance has disappeared again. In Realm Studio we have no users, no realms and no logs. And our usage has not been high. What’s going on?


#16

Can you open a ticket at support.realm.io and provide more information about when you’ve things have stopped working for you and what you were doing at the time. The forums are a poor medium to troubleshoot such incidents.


#17

@Nosl If you have an issue please open a ticket at support.realm.io - that is an SLA governed support channel - the forums are not - so if someone posts here we do not get alerted.

We have monitoring set-up but sometimes the ROS can report up even though the actual internals might be experiencing unexpected behavior.


#18

Thanks for the tip @ianward. Our instance has come back. From the logs it looks like it has been restarted again.


#19

I’ve tried to login to support.realm.io using the account I use to login to this forum, but my credentials were not recognised.

I then tried to login using the account our instance runs under, but my credentials were not recognised.

I then created a new account at support.realm.io using the credentials our instance runs under, but when I tried to login my credentials were not recognised.

Yes our instance has disappeared again and I am unable to create a support ticket.


#20

@Nosl Yes they are separate systems so you will need to sign up again, we are thinking of combining the systems but have not yet.

I am surprised to hear that a new account did not work. Would you mind sending the exact steps you took to create the account to [email protected] ? That should not happen

We are aware of your tenant outage and were alerted - it seems to occur only your tenant. We have identified some unexpected behavior due to deleting special realms and users on your instance.

Have you been deleting users and realms on a production tenant?