Read Copy Update

Task assignment

Imagine you are developing a multithreaded application that concurrently performs many read requests and some write requests on a shared data structure. Your goal is to speed up the application in such a way there are as many read requests as possible while keeping the same amount of write requests.

The following steps should lead you toward the goal:

  1. Download a naive implementation with mutexes from git repository:

    git clone https://gitlab.fel.cvut.cz/esw/rcu.git
    
  2. Follow the build instructions in the README file.

  3. Run the program (./builddir/list_mutex <number of reader threads>) and measure the number of read requests and their scalability.

  4. Replace the mutexes with rwlocks and measure the improvement.

  5. Replace the list implementation with liburcu (userspace RCU library) list and measure the improvement.

  6. For rwlocks and liburcu implementations, measure the number of reads per second for different number of reader threads and plot the results into a graph (for example from one to twenty readers). Use sufficient hardware -- total number of threads must not be larger than physical cores. Ritchie server is sufficient hardware (ssh username@ritchie.ciirc.cvut.cz, KOS password).

  7. Upload the diff of your changes into the upload system. Generate the diff using this command:

    git diff origin/master > rcu.diff.txt
    

Note

Do not modify the output of the program.

Hints

  • Try replacing rand_r(&seed) in generate_random_string() with rand() and measure performance. It's much slower. Why?

  • You can measure the amount of atomic instructions in your program as follows:

    perf stat -e mem_inst_retired.lock_loads timeout 1 ./list_mutex 1
    
  • Try modifying program parameters as follows:

    #define LIST_LENGTH 10
    #define KEY_LEN 3
    #define VALUE_LEN 400
    

    and run ./list_rwlock 4. Sooner or later, you'll see wrong checksum error. Why?

  • You can test your program for presence of potential race conditions by recompiling it with thread sanitizer*:

    meson configure -Db_sanitize=thread ...
    

    The program will run much slower, but will report potential races.

  • You can use Compiler Explorer to investigate the difference between atomic and non-atomic counters (as used for stats.reads and stats.writes).

  • Use the following shell command to get a quick overview of scalability of the list implementation:

    for i in $(seq $(nproc)); do echo $i; timeout 1.1 ./list_mutex $i | grep Reads; done
    

    If you plot the results as a graph, it should look roughly like this:

    Graph comparing requests count to readers count for RCU and
   rwlock implementations

pthread rwlocks

The rwlock is a part of the pthread library. You can be interested in:

Userspace RCU

The liburcu should work on Linux, Cygwin and MacOS X. You can obtain liburcu from your Linux distribution repository or compile it yourself. liburcu is also available as a Nix package.

In case you want more fun, feel free to compile liburcu yourself:

    wget https://lttng.org/files/urcu/userspace-rcu-latest-0.14.tar.bz2
    tar -xjf userspace-rcu-latest-0.14.tar.bz2
    cd userspace-rcu-0.14.0/
    ./configure --prefix=/home/me/somefolder/mybuild/output/target
    make -j8
    make install

See the ./doc/examples/list/ directory. It contains useful examples.

Update your build system to link to the liburcu:

  • add -lurcu or -lurcu-qsbr or other rcu flavour into linker flags
  • add a path to the include directory -I/home/me/somefolder/mybuild/output/target/include
  • add a path to the library directory -L/home/me/somefolder/mybuild/output/target/lib

Examples how to use the liburcu are available also on github.

Supplementary materials