Isolated dynamic library loading with dlmopen

I am working on the Batsim simulator and want to add a feature in it. My goal is to enable scheduling algorithms to be implemented as good old shared libraries (.so ELF files) and to load them dynamically when the Batsim process starts (depending on command-line arguments). Currently, scheduling algorithms must be implemented as separate processes that communicate with Batsim with a network protocol.

This led me to test whether loading several libraries in the same process was possible with the following constraints.

  • Loaded libraries have a common C API, from which the main program (Batsim) will call them. The main program must be able to select a library instance and call functions from the common C API on it.

  • Loaded libraries can have non-disjoint dynamic dependencies. For example, all loaded libraries can have a (NEEDED) libsomedependency.so in the dynamic section of their ELF files.

  • Loaded libraries and the main program can have non-disjoint dynamic dependencies.

  • All of loaded libraries, their dependencies, Batsim and its dependencies can have global variables. Global variables can be mutable.

  • All global variables must be privatized so that each loaded library live in a separate world that have no side effect on other libraries nor with Batsim.

While reading man dlopen to search whether some flags enables this behavior, I found dlmopen which looked like a perfect candidate so I tried it on a toy example.

Test setup

All the code presented here is available on the dlmopen-test git repository.

This repository contains two shared libraries (compiled as .so ELF files) and an executable program (also compiled into an ELF file, but with a main function to directly execute it).

  • The base library.

  • The user library, that uses the base library (dynamic link).

  • The runner program, that uses the base library (dynamic link) and loads user libraries at runtime (via dlmopen) depending on its command-line arguments.

A version number is given to all libraries/programs when they are compiled. All compiled ELFs contain a function that always returns the version that was given to them at compile-time, and a global mutable variable that is initialized with the same value.

More precisely, the ELFs have the following symbols.

  • base: int base_version() function, int base_global_value variable.

  • user: int version() function, int global_value variable and char * fullname() that returns a dynamically allocated string that recaps information about a user.

  • runner: int version() function, int global_value variable and char * fullname() that returns a dynamically allocated string that recaps information about the runner.

The corresponding code is straightforward.

#ifndef VERSION
#define VERSION 42
#endif

int base_version()
{
    return VERSION;
}

int base_global_value = VERSION;
#define _GNU_SOURCE
#include <stdio.h>

#ifndef VERSION
#define VERSION 51
#endif

int base_version();
extern int base_global_value;

int version()
{
    return VERSION;
}

int global_value = VERSION;

char * fullname()
{
    char * str;
    int ret = asprintf(&str, "my_glob='%d', my_version=%d, base_glob='%d', base_version=%d",
        global_value, version(), base_global_value, base_version()
    );

    if (ret == -1)
        return NULL;

    return str;
}

The build system (Meson here) defines VERSION at build-time.

Part of base’s build definition (meson.build)
project('dlmopen-test-base', 'c', default_options : ['c_std=c11'])

base_shared = shared_library('base', 'base.c',
  install: true,
  c_args: '-DVERSION=@0@'.format(get_option('version'))
)

Several instances of base and user are compiled with various VERSION values.

  • base-0, base-1 and base-2 with VERSION=x for base-x.

  • user-1 with VERSION=1, that uses base-1.

  • user-2 with VERSION=2, that uses base-2.

  • runner-0 with VERSION=0, that uses base-0.

These various ELFs are compiled thanks to the Nix package manager. Nix makes the definition of these combinations simple and makes sure that all generated ELFs have fully defined dependencies. As I write these lines, this is done by setting DT_RUNPATH in compiled ELFs so that they load the right versions of their dependencies at runtime. Here is the Nix code that describes these combinations.

{ pkgs ? import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/21.05.tar.gz") {}
}:

let
  gendrv = v: name: dir: inputs: pkgs.stdenv.mkDerivation rec {
    pname = name;
    version = v;

    buildInputs = [pkgs.meson pkgs.ninja pkgs.pkgconfig] ++ inputs;
    src = pkgs.lib.sourceByRegex dir [
      "^.*\.c"
      "^meson\.build"
      "^meson_options\.txt"
    ];
    mesonFlags = ["-Dversion=${v}"];
  };
  self = rec {
    base-0 = gendrv "0" "base" ./base [];
    base-1 = gendrv "1" "base" ./base [];
    base-2 = gendrv "2" "base" ./base [];
    user-1 = gendrv "1" "user" ./user [base-1];
    user-2 = gendrv "2" "user" ./user [base-2];
    runner-0 = gendrv "0" "runner" ./runner [base-0];
  };
in
  self

Runner code

The runner full code is runner.c. It starts with the same code as in user.

#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdlib.h>
#include <stdio.h>

#ifndef VERSION
#define VERSION 37
#endif

int base_version();
extern int base_global_value;

int version()
{
    return VERSION;
}

int global_value = VERSION;

char * fullname()
{
    char * str;
    int ret = asprintf(&str, "my_glob='%d', my_version=%d, base_glob='%d', base_version=%d",
        global_value, version(), base_global_value, base_version()
    );

    if (ret == -1)
        return NULL;

    return str;
}

It then defines a struct User that enables the runner to access the variables and functions of a user instance loaded in memory (via pointers and function pointers).

struct User
{
    void * handle;
    int (*version)(void);
    int* global_value;
    char* (*fullname)(void);
    int (*base_version)(void);
    int* base_global_value;
    void (*free)(void*);
};

The code to load a user into a struct User uses dlmopen and dlsym.

void * load_symbol(void * handle, const char * symbol)
{
    void * address = dlsym(handle, symbol);
    if (address == NULL)
    {
        printf("dlsym failed: %s'\n", dlerror());
    }
    return address;
}

int populate_user(const char * lib_path, struct User * user)
{
    user->handle = dlmopen(LM_ID_NEWLM, lib_path, RTLD_NOW | RTLD_LOCAL | RTLD_DEEPBIND);
    //user->handle = dlopen(lib_path, RTLD_NOW | RTLD_LOCAL | RTLD_DEEPBIND);
    if (user->handle == NULL)
    {
        printf("dlmopen error: %s'\n", dlerror());
        return 0;
    }

    user->version = load_symbol(user->handle, "version");
    user->global_value = load_symbol(user->handle, "global_value");
    user->fullname = load_symbol(user->handle, "fullname");
    user->base_version = load_symbol(user->handle, "base_version");
    user->base_global_value = load_symbol(user->handle, "base_global_value");
    user->free = load_symbol(user->handle, "free");


    return !(user->version == NULL ||
             user->global_value == NULL ||
             user->fullname == NULL ||
             user->base_version == NULL ||
             user->base_global_value == NULL ||
             user->free == NULL);
}

The rest of runner.c defines a quick experiment to check whether dlmopen fits my need. First, the main function reads its command-line arguments (that are paths to user ELF files) and load all of them in memory.

int main(int argc, char ** argv)
{
    char * str = fullname();
    printf("runner fullname: %s\n", str);
    free(str);

    // Load all user libs
    const int nb_users = argc-1;
    struct User users[nb_users];
    for (int i = 1; i < argc; ++i)
    {
        if (!populate_user(argv[i], &users[i-1]))
        {
            printf("could not populate user %d, aborting.\n", i);
            abort();
        }
    }

Then it prints the various values (by calling the fullname function from the runner’s ELF itself or from user ELFs).

    printf("All users have been loaded.\n");
    str = fullname();
    printf("runner fullname: %s\n", str);
    free(str);

    for (int i = 0; i < nb_users; ++i)
    {
        char * value = users[i].fullname();
        printf("user %d fullname: %s\n", i, value);
        users[i].free(value);
    }

During its execution, runner changes the values of all global variables to make sure the desired ones get updated (and them only).

    printf("Changing global values.\n");
    global_value = 42;
    base_global_value = 420;
    for (int i = 0; i < nb_users; ++i)
    {
        *(users[i].global_value) = (i+1)*10;
        *(users[i].base_global_value) = (i+1)*100;
    }

Printings are done at the following steps.

  • At the main function’s beginning (only for runner).

  • After all user ELFs have been loaded.

  • After all global variables have been modified.

  • At the main function’s ending (only for runner).

Does it work?

First, user ELFs can be compiled via nix-build commands.

#!/usr/bin/env bash
nix-build . -A user-1 -o result-user1
nix-build . -A user-2 -o result-user2
nix-build . -A runner-0 -o result

The following code loads user-1 and user-2.

#!/usr/bin/env bash
./result/bin/runner $(realpath ./result-user1/lib/libuser.so) $(realpath ./result-user2/lib/libuser.so)

Everything looks great in the output log :). All values are the expected one when the user ELFs are loaded, and changing global variables had the expected outcome.

runner fullname: my_glob='0', my_version=0, base_glob='0', base_version=0
All users have been loaded.
runner fullname: my_glob='0', my_version=0, base_glob='0', base_version=0
user 0 fullname: my_glob='1', my_version=1, base_glob='1', base_version=1
user 1 fullname: my_glob='2', my_version=2, base_glob='2', base_version=2
Changing global values.
Printing fullnames again.
runner fullname: my_glob='42', my_version=0, base_glob='420', base_version=0
user 0 fullname: my_glob='10', my_version=1, base_glob='100', base_version=1
user 1 fullname: my_glob='20', my_version=2, base_glob='200', base_version=2
Removing user libs from memory.
runner fullname: my_glob='42', my_version=0, base_glob='420', base_version=0

And everything also looks great when the exact same library is loaded twice :). user ELFs have independent global variable, and their base dependency too.

#!/usr/bin/env bash
./result/bin/runner $(realpath ./result-user1/lib/libuser.so) $(realpath ./result-user1/lib/libuser.so)
Output log of run-same-lib.bash
runner fullname: my_glob='0', my_version=0, base_glob='0', base_version=0
All users have been loaded.
runner fullname: my_glob='0', my_version=0, base_glob='0', base_version=0
user 0 fullname: my_glob='1', my_version=1, base_glob='1', base_version=1
user 1 fullname: my_glob='1', my_version=1, base_glob='1', base_version=1
Changing global values.
Printing fullnames again.
runner fullname: my_glob='42', my_version=0, base_glob='420', base_version=0
user 0 fullname: my_glob='10', my_version=1, base_glob='100', base_version=1
user 1 fullname: my_glob='20', my_version=1, base_glob='200', base_version=1
Removing user libs from memory.
runner fullname: my_glob='42', my_version=0, base_glob='420', base_version=0