CPU usage in Azure spikes and does not release without restarting the web app


#1

I am running a C# web app in Azure (with Realm v3.3.0) and have experienced odd 100% CPU usage. The memory stays low, but it looks like the native code is hogging the CPU. I cannot get this to happen on my local PC. This might be because the plan I am running with is the lowest production version. I am running on the Production S1 service plan. I have pulled this diagsession from kudu when the cpu was high and as you can see realm-wrappers.dll is way up there. I am still in alpha and have almost no usage except myself. One minute the CPU will be low, then it will spike and stay up there until I restart the web app.

The code running on the server is essentially just doling out permissions to organization realms with JWT tokens. It will create a realm locally for an organization so that a FullSyncRealm is created on the server with the appropriate user permissions. The server then deletes the local realm so that it does not cache all of the realm data on the server. It also creates users on the cloud realm by signing in with their token, then signing out.

Here is an image of a diagsession after a spike pulled from kudu.

Here is the CPU usage.

Any assistance would be greatly appreciated. Let me know what other information might be helpful to diagnose the problem.


#2

@adamrhass More than likely this is due to downloading the realm and creating the file on disk. The way realm sync works is that when you open a realm it streams the operations logs to the sync-client - the realm process could potentially have to trigger our merge algorithm and this can spike the CPU as it assembles the realm file on disk. Once the file is formed on disk it should have a much easier time merging changes but since you are deleting it every time you may see this spike each time you open and download the realm.

I’d recommend using asyncOpen to open the realm and not deleting the realm each time.


#3

Does it make sense that it would stay at 100% with relatively no usage? I left it over the entire thanksgiving break to see if it would ever come back down from 100% and it never did. Looking at the logs I can see that the last call to create a new realm was on 11/14/2018. I am attaching all of the realm code that the server uses below. Again, the goal is just to create users and give them access to realms that the server creates. There is no reason that this server should need to access the data in a realm. I also added this request to github. https://github.com/realm/realm-dotnet/issues/1810. I know that some of the code below is imperfect, but if you see a better way to achieve these goals, please let me know.

    public class RealmManager:IRealmManager
    {
        private IRealmConfiguration _realmConfiguration;
        private IJwtFactory _jwtFactory;
        private Uri _authURL = null;
        private IServiceProvider _serviceProvider = null;
        
        public RealmManager(IRealmConfiguration realmConfiguration, IJwtFactory jwtFactory, IServiceProvider serviceProvider) //, IOrganizationApplicationUserProvider organizationApplicationUserProvider) //, IHostingEnvironment hostingEnvironment)
            
        {
            _realmConfiguration = realmConfiguration;
            _jwtFactory = jwtFactory;
            _serviceProvider = serviceProvider;

            _authURL = new Uri($"https://{_realmConfiguration.Host}");

            var root = Path.GetTempPath();
            var dir = Path.Combine(root, $"realm-{Guid.NewGuid().ToString()}");
            if (Directory.Exists(dir)) Directory.Delete(dir, true);
            if (!Directory.Exists(dir)) Directory.CreateDirectory(dir);
            User.ConfigurePersistence(UserPersistenceMode.NotEncrypted, basePath: dir);
        }
        

        // //////////////////////////////////////////
        //a dummy class to allow realm creation
        public class Dummy : RealmObject
        {
            public string Name { get; set; }
        }
        public void CreateOrganizationRealm(Organization org)
        {
            AsyncContext.Run(async () =>
            {
                try
                {
                    //create the realm on the realm server
                    await WithAdminUserUserAsync(async (admin) => { 
                    var config = GetOrgRealmConfig(org.Id, admin);
                
                        //using async to make sure that we have a realm that is synced with the cloud
                        using (var r = await Realm.GetInstanceAsync(config))
                        {}

                        //removes the local file, does not delete it from realm server, wait 5 seconds before trying
                        var t = Task.Run(() => {
                            try
                            {
                                var c = GetOrgRealmConfig(org.Id, admin);
                                Task.Delay(5000);
                                Realm.DeleteRealm(c);
                            }
                            catch (Exception e) {
                                Trace.TraceError($"could not delete file");
                            }
                        });
                    });
                    //Realm.DeleteRealm(config);
                }
                catch (Exception e)
                {
                    Trace.TraceError($"error while creating realm {e.Message ?? ""}");
                    throw e;
                }
            });
        }
        
        public Task DeleteOrganizationRealmAsync(Organization org)
        {
            //remove all access to database. Deleting realms from the realm server is discouraged 
            //https://docs.realm.io/platform/self-hosted/customize/working-with-realms-on-the-server/deleting-realms#deleting-a-realm-with-a-http-delete-call
            AsyncContext.Run(async () =>
            {
                using (var orgUserProvider = _serviceProvider.GetService<IOrganizationApplicationUserProvider>())
                {
                    var orgUsers = await orgUserProvider.GetOrganizationUsersForOrganizationIdAsync(org.Id);
                    await WithAdminUserUserAsync(async (admin) => {
                        foreach (var ou in orgUsers)
                        {
                            try
                            {
                                var condition = PermissionCondition.UserId(ou.ApplicationUser_ID);
                                var task = admin.ApplyPermissionsAsync(condition, GetFullRealmUrl(org.Id), AccessLevel.None);
                                await Task.WhenAny(task, Task.Delay(3000));
                                if (!task.IsCompleted && !task.IsFaulted)
                                {
                                    Trace.TraceError("$Could not delete organization from realm");
                                    
                                }
                            }
                            catch (Exception e)
                            {
                                Trace.TraceError("$could not delete permission for user - {ou.ApplicationUser_ID}");
                            }
                        }
                    });
                }
            });
            return Task.CompletedTask;
        }

        public void CreatePermissionsToOrganizationRealm(OrganizationApplicationUser oau)
        {
            //make sure the user has been created by logging in and out
            AsyncContext.Run(async () =>
            {
                //here we make sure that a user is created with the given id, but sign out instantly
                await WithBasicUserAsync(oau.ApplicationUser_ID);

                //here we sign in as an admin and set the permission for the user id on the realm url
                await WithAdminUserUserAsync(async (admin) => { 
                    var condition = PermissionCondition.UserId(oau.ApplicationUser_ID);
                    var task = admin.ApplyPermissionsAsync(condition, GetFullRealmUrl(oau.Organization_ID), AccessLevel.Write);
                    await Task.WhenAny(task, Task.Delay(3000));
                    if (!task.IsCompleted && !task.IsFaulted) {
                        Trace.TraceError("$Could not set permissions for organization realm");
                    }
                });
            });
        }
        public void DeletePermissionsToOrganizationRealm(OrganizationApplicationUser oau)
        {
            var userId = oau.ApplicationUser_ID;
            var orgId = oau.Organization_ID;
            var condition = PermissionCondition.UserId(userId);
            
            AsyncContext.Run(async () =>
            {
                try
                {
                    await WithAdminUserUserAsync(async (admin) => { 
                        var task = admin.ApplyPermissionsAsync(condition, GetFullRealmUrl(orgId), AccessLevel.None);
                        await Task.WhenAny(task, Task.Delay(3000));
                        if (!task.IsCompleted && !task.IsFaulted)
                        {
                            Trace.TraceError("$Could not delete permissions from realm");
                        }
                    });
                }
                catch (Exception e)
                {
                    Trace.TraceError($"Error while deleting permissions for userId: {userId} and orgId: {orgId}");
                }
            });
        }
        public string GetHost()
        {
            return _realmConfiguration.Host;
        }
        public string GetRelativeRealmPath(Guid orgId)
        {
            return $"/orgs/{orgId}";
        }




        private async Task WithBasicUserAsync(string userId, Action<User> action = null)
        {
            var token = await GetUserTokenAsync(userId); // a string representation of a JWT, obtained from your auth server
            var credentials = Credentials.JWT(token);
            var u = await User.LoginAsync(credentials, _authURL);
            try
            {
                action?.Invoke(u);
            }
            catch (Exception e) {
                Trace.TraceError($"error while authenticating basic user - {e.Message ?? ""}");
            }
            await u.LogOutAsync();
        }

        private static int _usingAdmin = 0;
        private async Task WithAdminUserUserAsync(Func<User, Task> action = null)
        {
            _usingAdmin++;
            User u = null;
            try
            {
                var token = await GetAdminTokenAsync(); // a string representation of a JWT, obtained from your auth server
                var credentials = Credentials.JWT(token);
                u = await GetAdminUserAsync();
                await action?.Invoke(u);
            }
            catch (Exception e)
            {
                Trace.TraceError($"error while authenticating basic user - {e.Message ?? ""}");
            }
            _usingAdmin--;
            if (_usingAdmin < 0) { _usingAdmin = 0; }
            if (_usingAdmin <= 0)
            {
                await u.LogOutAsync();
            }
        }
        private async Task WithAdminUserUserAsync(Action<User> action = null)
        {
            await WithAdminUserUserAsync((u) =>
            {
                action?.Invoke(u);
                return Task.CompletedTask;
            });
        }
        //private static User adminUser = null; 
        private async Task<User> GetAdminUserAsync()
        {
            var adminUser = User.AllLoggedIn.Where(x => x.Identity == Guid.Empty.ToString() && x.IsAdmin && x.ServerUri.Host == _realmConfiguration.Host).FirstOrDefault();
            if (adminUser != null)
            {
                return adminUser;
            }
            
            var token = await GetAdminTokenAsync(); // a string representation of a JWT, obtained from your auth server
            var credentials = Credentials.JWT(token);
            try
            {
                adminUser = await User.LoginAsync(credentials, _authURL);
            }
            catch (Exception e) {
                Trace.TraceError($"error while logging in - {e.Message ?? ""}");
                //throw new Exception("cannot ")
            }
            return adminUser;
        }

        private FullSyncConfiguration GetOrgRealmConfig(Guid orgId, User admin) {

            FullSyncConfiguration configuration = null;
            try
            {

                var serverURL = new Uri(GetFullRealmUrl(orgId), UriKind.Absolute);
                configuration = new FullSyncConfiguration(serverURL, admin);// { IsDynamic = true };
                configuration.ObjectClasses = new[] { typeof(Dummy) };
                
            }
            catch (Exception e)
            {
                Trace.TraceError($"error while logging in - {e.Message ?? ""}");
            }
            return configuration;
        }
        private async Task<string> GetAdminTokenAsync()
        {
            var adminName = "_AuthAdmin_";
            var identity = _jwtFactory.GenerateClaimsIdentity(adminName, Guid.Empty.ToString(), Guid.Empty.ToString(), true);
            return await _jwtFactory.GenerateEncodedTokenAsync(adminName, identity);
        }
        private async Task<string> GetUserTokenAsync(string userId)
        {
            var name = userId;
            var identity = _jwtFactory.GenerateClaimsIdentity(name, userId, userId, false);
            return await _jwtFactory.GenerateEncodedTokenAsync(name, identity);
        }

        private string GetFullRealmUrl(Guid orgId) {
            return $"realms://{_realmConfiguration.Host}{GetRelativeRealmPath(orgId)}";
        }```

#4

I’ll take a look tomorrow and post some pointers there. You’re right - logging in as a user and creating their Realm is a bit excessive and there are simpler ways to achieve that via an HTTP API.


#5

Okay, the way I see it, you want to implement CreateOrganizationRealm in a way that doesn’t involve opening the Realm. You can achieve that by hitting the following endpoint:

GET https://YOUR-SERVER-URL/realms/files/%REALM-PATH%

HEADERS:
Authorization: adminUser.RefreshToken

where %REALM-PATH% is the url-encoded relative realm path, e.g. %2Forgs%2F{orgId} in your app and adminUser.RefreshToken is the RefreshToken of the user you obtain in WithAdminUserUserAsync. I haven’t translated that into c# code as, judging by the snippet you posted, you seem more than capable of doing it, but if you hit any hiccups, let me know and I’ll do my best to help.


#6

Great! I’ll add that logic to create the organizations. Are there similar methods for user creation or permission management? I’ll post back here soon.


#7

You can “login” a user via HTTP calls (this is what the SDK essentially does), but I’m not sure this will be an improvement compared to just using the SDK. You can see how this is implemented here: https://github.com/realm/realm-dotnet/blob/master/Realm/Realm.Sync/Helpers/AuthenticationHelper.cs#L106-L119 and Credentials.ToDictionary is here: https://github.com/realm/realm-dotnet/blob/master/Realm/Realm.Sync/Credentials.cs#L221-L230.

We don’t have HTTP API for granting permissions unfortunately - for this you need to use the SDK methods.


#8

@nirinchev, I made the change to the CreateOrganizationRealm call and it works great, though I was still able to make the issue happen when applying permissions. I dug into the code deeper and I think I have a better idea what might be causing the cpu issue. I think it stems from trying to logout aggressively from the admin user after every operation.

First, Realms.Sync.User creates two realms, _permissionRealm and _managementRealm, but I can’t see where those get disposed?

Second, maybe if the permission task was still running after my timeout and while I was logging out, something goofy was happening under the covers.

I have simply allowed the admin user to stay logged in on azure and I can’t seem to make the CPU spike anymore, which is great. I’ll let you know if I have any issues in the future, but for now, I seem to have a solution.


#9

@nirinchev It looks like I spoke too soon. The CPU spiked again yesterday afternoon and stayed spiked until I reset it. I got a diagnostics session which is still pointing to realm. I am no longer logging out of the admin user and I am only creating new realms via the HTTPS call. This makes me think it is something wrong in permissions. Thoughts?


#10

@nirinchev Ok, I think I am making progress on this issue. I found that when I call CreatePermissionsToOrganizationRealm, the CPU releases. The CPU will go up on its own, but calling that method releases it. Could this be an issue with not having a looper thread? I also hacked a fix into the realm manager which just calls GetGrantedPermissionsAsync every 5 seconds for the admin user on a background thread. Below you can see where The CPU takes off, then it goes down after the background thread does its work. Obviosly this is not a great solution.

       private static Thread thread;
       private static Looper looper;
       public RealmManager(IRealmConfiguration realmConfiguration, IJwtFactory jwtFactory, IServiceProvider serviceProvider) //, IOrganizationApplicationUserProvider organizationApplicationUserProvider) //, IHostingEnvironment hostingEnvironment)
        {
            ...other constructor work...
            if (looper != null) {
                looper.Cancel();
            }
            looper = new Looper(this);
            thread = new Thread(new ThreadStart(looper.Loop));
            thread.Start();
        }
        private class Looper {
            RealmManager rm = null;
            public Looper(RealmManager rm) {
                this.rm = rm;
            }
            Boolean running = true;
            public void Cancel() {
                running = false;
            }
            public void Loop() {
                AsyncContext.Run(async () => {
                    while (running)
                    {
                        await Task.Delay(5000);
                        await rm.WithAdminUserAsync(async admin =>
                        {
                            var perms = await admin.GetGrantedPermissionsAsync();
                        });
                    }
                });
            }
        }

#11

Hm… not having a runloop is definitely a smoking gun there, but it’s surprising because all your methods are using AsyncContext. It installs a SynchronizationContext on your thread, which in turn should allow for things to “just work”, however that is not the case. To be fair, the permission API are indeed designed to work on a client where all permission operations happen on the main thread (which is evident by the fact that the Realms used internally by those API are never disposed of).

If you’re willing to give it a go, I would suggest redesigning the manager to have a single long-lived loop wrapped in AsyncContext.Run and a thread-safe queue where you could schedule new work for the manager to perform. Something like:

public class RealmManager
{
    private readonly ConcurrentQueue<WorkItem> _queue = new ConcurrentQueue<WorkItem>();

    public RealmManager()
    {
        Run();
    }

    public Task DeleteOrganizationRealm(Organization org)
    {
        var workItem = new DeleteOrganizationRealmWorkItem(org);
        _queue.Enqueue(workItem);
        return workItem.WaitForCompletion();
    }

    private void Run()
    {
        Task.Run(() =>
        {
            AsyncContext.Run(async () =>
            {
                var admin = await GetAdminUserAsync();
                while (true)
                {
                    if (_queue.TryDequeue(out var workItem))
                    {
                        switch (workItem)
                        {
                            case DeleteOrganizationRealmWorkItem dorwi:
                                await this.DeleteOrganizationRealm(dorwi.Organization, admin);
                                dorwi.Complete();
                                break;
                            // rest of the cases
                        }
                    }
                    else
                    {
                        await Task.Delay(50);
                    }
                }
            });
        });
    }

    private Task DeleteOrganizationRealm(Organization org, User admin)
    {
        using (var orgUserProvider = _serviceProvider.GetService<IOrganizationApplicationUserProvider>())
        {
            var orgUsers = await orgUserProvider.GetOrganizationUsersForOrganizationIdAsync(org.Id);
            foreach (var ou in orgUsers)
            {
                try
                {
                    var condition = PermissionCondition.UserId(ou.ApplicationUser_ID);
                    var task = admin.ApplyPermissionsAsync(condition, GetFullRealmUrl(org.Id), AccessLevel.None);
                    await Task.WhenAny(task, Task.Delay(3000));
                    if (!task.IsCompleted && !task.IsFaulted)
                    {
                        Trace.TraceError("$Could not delete organization from realm");
                    }
                }
                catch (Exception e)
                {
                    Trace.TraceError("$could not delete permission for user - {ou.ApplicationUser_ID}");
                }
            }
        }
    }
}

public abstract class WorkItem
{
    private TaskCompletionSource<object> _tcs;

    public void Complete()
    {
        _tcs.TrySetResult(null);
    }

    public Task WaitForCompletion()
    {
        return _tcs.Task;
    }
}

public class DeleteOrganizationRealmWorkItem : WorkItem
{
    public Organization Organization { get; }

    public DeleteOrganizationRealmWorkItem(Organization organization)
    {
        Organization = organization;
    }
}

Essentially, this executes all tasks in a serial queue on the same AsyncContext thread, in a way emulating a main thread on a client device. The drawback, obviously is that this is single threaded, so it can’t handle really high throughputs, but if you don’t expect these operations to be invoked thousands of times per second, I imagine it will be fine.