B4M36ESW – Efficient Software
Within the course of Efficient software you will get familiar with the area of software and algorithm optimization under limited resources. The course focuses on the efficient use of modern hardware architectures – multi-core and multi-processor systems with shared memory. Students will practically implement and use presented techniques in C/C++/Rust and Java. Main topics are: code optimization, efficient data structures and processor cache usage, data structures in multi-threaded applications, and implementation of efficient network servers.
Assessment
Graded assessment is based on the point evaluation, where the final grade is given by the evaluation scale The study and examination code of CTU, article 15. A student can obtain 60 points for the work during the practical part of the course and 30 points for the written final evaluation. Oral examination is optional and a student can obtain up to 10 points. In order to get zápočet/basic assessment (the condition prior final examination), a student must have at least 30 points for the work during the practical part of the course and all tasks must be successfully submitted. A student can pass examination if he/she has at least 20 points from final examination. The final grade is given by the sum of points according to the evaluation scale.
Example of exam test
Time limit for both parts together is 60 minutes.
Lectures recordings
Recordings from 2024 are available in the YouTube playlist.
Lectures
# | Date | Topics | Slides |
---|---|---|---|
1. | 19. 2. | Introduction, modern computer architecture, C compilers. | slides |
2. | 26. 2. | Bentley's rules, C compiler, profiling. | slides 1, slides 2, code |
3. | 4. 3. | Benchmarking, measurements, metrics, statistics, WCET, timestamping. | slides |
4. | 11. 3. | Scalable synchronization – from mutexes to RCU (read-copy-update), transactional memory, scalable API. | slides |
5. | 18. 3. | Memory access – efficient programming with caches, dynamic memory allocation (malloc, NUMA, ...). | slides, animations |
6. | 25. 3. | Serialization of data structures – JSON, XML, protobufs, AVRO, cap'n'proto, mmap/shared memory. | slides |
1. 4. | Easter | ||
7. | 8. 4. | Program run – virtual machine, byte-code, Java compiler, JIT compiler, relation to machine code, byte-code analysis, dissasembly of Java byte-code, optimization in compilers, program performance analysis, profiling. | slides |
8. | 15. 4. | Data concurrency in JVM – multi-threaded access to data, locks monitoring, atomic operations, lock-less/block-free data structures, non-blocking algorithms (queue, stack, set, dictionary), data races, synchronization. | slides |
9. | 22. 4. | Efficient servers, C10K, non-blocking I/O, efficient networking, threads | slides |
10. | 29. 4. | JVM – Memory analysis (dynamic/static), data structures, collections for performance. | slides |
11. | 6. 5. | JVM – Object allocation, bloom filters, references, effective caching. | slides |
12. | 13. 5. | Virtualization (IOMMU, SR-IOV, PCI pass-through, virtio, …). | slides, code |
13. | 20. 5. | Memory Management in JVM – Memory Layout, Garbage Collectors. | slides |
Labs
You will be given nine tasks during the semester. Eight simpler tasks are scored by 5 points per task. The ninth task is more complex and is scored by 20 points, including functionality and code quality. The order of the tasks and deadlines may change before their assignment.
Date | Task assignment | Language | Max. points | Deadline | Presentation (by us) |
---|---|---|---|---|---|
19. 2. | How to compile C project | C/C++ | |||
26. 2. | T1: Profiling C | C/C++ | 5 | ||
4. 3. | T2: Asynchronous I/O | C/C++ | 5 | ||
11. 3. | T3: Read Copy Update | C/C++ | 5 | T1 | |
18. 3. | T4: Benchmarking | Java | 5 | T2 | |
25. 3. | T5: Serialization | Java/C/C++ | 5 | T3 | |
1. 4. | Easter | ||||
8. 4. | T6: Profiling Java | Java | 5 | T4 | T1 |
15. 4. | T7: Non-blocking Algorithms | Java | 5 | T5 | T2 |
22. 4. | T8: Efficient servers | whatever | 20 | T6 | T3,4 |
29. 4. | T9: Vector API (Java) | Java | 5 | T7 | T5 |
6. 5. | |||||
13. 5. | T9 | T6,7 | |||
20. 5. | T8 | T8 | |||
Total | 60 (+8) |
Tasks are submitted into the upload system (BRUTE) and in some cases also to your tutor. The maximum amount of points will be given when the task is uploaded in time, otherwise, penalty points will be applied – one point per week for simpler tasks and two points per week for the complex task will be deducted from the total task score. Simpler tasks submitted after solution presentation (4-5 weeks after the assignment) will be rewarded by 0 points (you still have to submit it).
All the tasks have to be submitted before the start of the examination period.
Lab attendance is recorded (voluntary). However, the order of consultations and submissions during labs is based on your attendance.
If you have any questions, please ask via GitLab Issues.
In order to obtain zápočet/basic assessment (the condition prior to the final examination), all tasks must be submitted with a sum of at least 30 points.
Contacts
- GitLab Issues (preferred)
- Lecturers:
- Practising:
Literature
- MIT: Performance-engineering-of-software-systems
- Oaks, S.: Java Performance: 2nd Edition. O'Reilly, USA 2020.
- Jones, R., Hosking, A., Moss, E.: The Garbage Collection Handbook – The Art of Automatic Memory Management. CRC Press, USA 2012.
- Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufman, 2008.
- Fog, A.: The microarchitecture of Intel, AMD and VIA CPU, 2016. (online)
- Drepper U.: What every programmer should know about memory, 2007
- Jain, R.: The Art of Computer Systems Performance Evaluation. Wiley, New York 1991. (slides, book)
- Lilja, D. J.: Measuring Computer Performance: A Practitioner's Guide. Cambridge University Press, 2000. (book web site, Supplemental Teaching Materials)