Benchmarking

Task Assignment

Your task is to measure time performance of three different implementations of matrix multiplication and compare them:

Standard multiplication
Multiplication with the second matrix transposed before the multiplication
Multiplication of the matrices in 1D representation

You probably need to do the following steps:

Download the program from the git repository:

git clone https://gitlab.fel.cvut.cz/esw/benchmarking.git

Open the source code in IDE
Use Java Microbenchmark Harness (JMH) to rigorously compare the implementations
Do the measurements on each of the implementations of the matrix multiplication according to the methodology described in [1] - summary in PDF
1. Determine the warm-up period for each implementation
  - Visual inspection of a sequence plot is sufficient
2. You do not have to calculate the number of repetitions
  - Use sufficiently large number (e.g., 30 iterations and 10 executions/forks)
  - If the resulting confidence interval is too wide, you need to add more repetitions.
3. Measure the time performance for each implementation and compute the average performance and 95% confidence interval of the measurements (Section 9.3)
4. Compute the comparison ratios of the implementations and 95% confidence intervals of the ratios (Section 10.1)
Upload the report (PDF) together with your benchmark implementation ( MatrixMultiplicationBenchmark.java), the JSON file with the measurements generated by JMH (has to be named: '' measurements.json'', do not modify this file), and also a file with the results (named: ''results.json''). The results file has to follow the following format (names has to be the ones in the measurements file without the packages):

   {
    "performance": [
        {
            "impl_name": "measureMultiply",
            "average": 274.45720624999996,
            "cf_lb": 271.6351655689061,
            "cf_ub": 273.27924693109395
        },
        {
            "impl_name": "measureMultiply1D",
            ...
        },
        {
            "impl_name": "measureMultiplyTrans",
            ...
        }
    ],
    "comparisons": [
        {
            "impl_1_name": "measureMultiply",
            "impl_2_name": "measureMultiply1D",
            "ratio": 0.9404346124708851,
            "cf_lb": 0.9374657053490532,
            "cf_ub": 0.943403519592717
        },
        {
            "impl_1_name": "measureMultiply",
            "impl_2_name": "measureMultiplyTrans",
            ...
        },
        {
            "impl_1_name": "measureMultiply1D",
            "impl_2_name": "measureMultiply",
            ...
        },
        {
            "impl_1_name": "measureMultiply1D",
            "impl_2_name": "measureMultiplyTrans",
            ...
        },
        {
            "impl_1_name": "measureMultiplyTrans",
            "impl_2_name": "measureMultiply",
            ...
        },
        {
            "impl_1_name": "measureMultiplyTrans",
            "impl_2_name": "measureMultiply1D",
            ...
        }
    ]
}

Check the automatically evaluated results in the upload system. The system shows OK if the difference of the uploaded results is less than 0.05% from the values calculated by the automatic evaluation. The automatic evaluation serves just as a check that the results you calculated are correct. The report will be evaluated manually.

Warning

The confidence intervals you are supposed to calculate are NOT the confidence intervals that are shown by JMH

Report Structure

The report should include the following parts: - Machine specification - CPU, memory, OS, Java version (java -version), etc.

Used JVM parameters (if applied)
Warm-up period:
- Brief description of how the warm-up period was determined
- The results (with graphs) for each implementation
Time performance
- Brief description of the measurement procedure
- Results including graphs with visualization of the confidence intervals (example https://i.stack.imgur.com/HmBYh.png)
Comparison:
- Brief description of how the comparison was done
- Resulting ratios with the confidence intervals
Conclusions:
- Summary of the results and a discussion of the results with reasoning why the results are as they are

Report templates: doc, LaTeX

JMH

JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks written in Java and other languages targeting the JVM.

https://github.com/openjdk/jmh

There is a plenty of tutorials, e.g.:

Confidence Interval Calculation

You can use the following Python 3.11 script as a baseline for the measurements processing. However, you can use any tool or language you want. Please, do NOT submit your scripts to the upload system.

import json
import math

import numpy as np
import scipy.stats as stats


## loads the JMH result JSON into a dictionary where keys are benchmark names without packages
## and values are numpy 2D arrays
def load_jmh_measurements(path: str) -> dict[
    str, np.array]: measurements = {
}


with open(path, 'rt', encoding='utf-8') as file: jmh_data = json.load(
    file)
for impl_data in jmh_data: name = impl_data['benchmark']
name = name[name.rindex('.') + 1:]  # remove package names
measurements[name] = np.array(
    impl_data['primaryMetric']['rawData'])
return measurements

confidence = 0.95

all_measurements = load_jmh_measurements('PATH_TO_JMH_RESULT_JSON')
impl_measurements = all_measurements['measureMultiply']

## r2, r1
execution_count, iteration_count = np.shape(impl_measurements)

mean = np.mean(impl_measurements)

## mean of execution variances
S1 = np.mean(np.var(impl_measurements, axis=1, ddof=1))
## variance of execution means
S2 = np.var(np.mean(impl_measurements, axis=1), ddof=1)

## (1-alpha/2)-quantile of t-distribution with given degrees of freedom
alpha = 1 - confidence
t_quantile = stats.t.ppf(1 - alpha / 2, df=execution_count - 1)

## half-width of the confidence interval
h = t_quantile * math.sqrt(S2 / execution_count)

print(f'{mean} +- {h} ms/op')
print(f'[{mean - h}, {mean + h}]')

Materials

[1] https://kar.kent.ac.uk/33611/ (mainly chapters 9.3 and 10.1)
[2] Kalibera, T. and Jones, R. E. (2012) Quantifying performance changes with effect size confidence intervals. Technical Report 4–12, University of Kent
[3] The slides from the second lecture
[4] Pitfalls presented at the seminar: https://www.oracle.com/technical-resources/articles/java/architect-benchmarking.html
[5] https://wiki.openjdk.java.net/display/HotSpot/MicroBenchmarks

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search