CSE 30341 Spring 2025 at Notre Dame
Please review the general instructions for assignments.
The goals of this project are:
Linear algebra lies at the core of many applications of computing: process optimization, machine learning, and computer graphics all depend on efficient manipulation of large arrays and matrices. Let’s take a simple example of a O(n^2) matrix operation and see how fast we can make it go.
This project will require writing some small amounts of code and evaluating the performance. You will write a short lab report detailing your approaches and the performance achieved.
The key idea here is that threads are most effective when used to operate on large, independent pieces of data, and only occasionally synchronize when necessary. The problem is that the synchronization is the tricky part!
Write a C program called seqperf.c that does the following:
Experiment with the size of the array a little bit until the time to compute the sum takes between ten and twenty seconds. (The specific size doesn’t matter as long as you use it consistently going forward.)
Note 1: You will probably get a different total each time, because the random
number generator gives you a difference sequence of numbers. Use srand at the beginning of
your program to fix the random seed and get consistent results for a given array size.)
Note 2: The sum will certainly overflow and wrap around, resulting in negative values some of the time. This is still fine as long as the result is consistent.
Make a copy of the previous program, and call the new one threadperf.c
Modify threadperf to do the same task, but this time, split the work
into N threads. Each thread should work on a disjoint part of the task,
producing a local partial sum, which is then added together at the end to
produce a global sum.
The size of the arrays and the number of threads should be given as command line arguments, and your program should print out the number of threads, the size of the array, the total sum, and the time to compute the sum, something like this: (example numbers are made-up)
% ./threadperf 128000 4
size 128000 threads 4 sum 12345678 time 15.425s
Important: The total sum is your assurance that the computation is being done correctly! If you don’t get the same number each time (and the same as in Part 1) then something is wrong with your code.
Once this is working, run the program with one thread, two threads, etc. and record the performance achieved in each step. Keep going until the performance flattens out. In fact, keep going until the performance gets worse! Keep notes on this for the lab report, below.
Make another copy of threadperf and call it syncperf.
Modify syncperf so that instead of keeping a local partial sum,
have each thread update the global total on every single addition.
(Protect the total with a mutex, of course.)
This probably won’t perform well. But let’s see just how bad it is.
Measure the performance of syncperf for 1-4 threads, and compare
it against threadperf. Keep your notes for the lab report.
Make another copy of threadperf, and call the new one progressperf.c
This final version should perform the same computation as Part 2, but with one twist: it should produce a (single) progress bar that goes from 0 to 100 percent incrementally as the total work is accomplished.
You can format the progress bar how you like, but something simple like this would be fine:
37% [==============>
This is trickier than it first sounds, because the threads must have a little bit of shared data in order to properly track progress. And of course that shared data must be protected so that you don’t suffer from race conditions. The bar should proceed smoothly, printing each value between 0 and 100 as work is done. However, you don’t want to print out identical values unnecessarily, because that will slow down execution.
This will require some thought to puzzle through, and you may need to try some different approaches until you have a good solution.
Repeat your performance measurements in the same way as Part 2.
The performance of progressperf should not be substantially slower than threadperf.
If it is, reconsider your method of synchronization and try a different approach.
Keep your notes for the lab report.
Create a text document labreport.txt that details the performance and behavior of your code:
seqperf that yields a runtime of 10-20s.seqperf compute? What is the average time to compute one cell? (As always, put your answer in the form of a small number (1-1024) and a metric suffix with appropriate units.)threadperf for a varying number of threads.syncperf for 1-4 threads.syncperf.progressperf for the same range of threads as threadperf.progressperfthreadperf and progressperf.gettimeofday around lines of code to measure the elapsed time.pthreads library to manage threads and perform locking. You will need to use mutexes to solve this problem. You will not need to use condition variables. (that’s in the next assignment)printf is sufficient. Try using \r instead of \n to update the current output line, instead of scrolling.Review the general instructions on how to submit to your dropbox.
seqperf.c, threadperf.c, syncperf.c progressperf.c and a Makefile that builds all three from scratch.labreport.txt containing your answers to the questions above.Your grade will be based on:
This assignment is due on Friday, February 21st at 5:00PM.