Processes and Threads

David Liu5/10/26About 4 min

Processes and Threads

This assignment – a simple parallel sorting program – will give you some hands-on experience with concurrency and its impact on application performance, as well as with several basic UNIX primitives. You will need the HW1 handout, as well as some framework code to get started, as described in the handout. You should continue using the ontainer you installed for Homework 0 to test your solution before submission.

Evaluation

Once you have completed your implementation, the final step is to do a small quantitative evaluation.

First,

use the makeinput tool to create data sets of various sizes, and find one that takes roughly ten seconds to sort with no concurrency (-n 1). You can measure the completion time using the UNIX time command, as follows (the first line is the command; the remaining lines are the output):

time ./mysort -n 1 input1.txt input2.txt > /dev/null real 0m10.441s user 0m10.347s sys 0m0.055s

Next,

vary the parameter -n from 1 to 20 and measure the completion times with and without the -t option. You will notice that the values reported as ‘user’ will start to become larger than the values reported as ‘real’; this is because the latter measures wall-clock time, whereas the former measures total time spent in user level on any core. You should write down the ‘real’ values, not the ‘user’ values! If the values do not start dropping substantially for N > 1, you either have a bug in your code, or you are measuring on a singlecore machine. If the latter, please measure on a machine with multiple cores. Please also make sure that during benchmarking, the system is not running any background processes to avoid potential interference with the measurement. To get stable results, you should repeat the measurement for each setting a few times (e.g., at least 3 times) and report the average.

Finally,

draw a graph that has N on the horizontal axis and the measured ‘real’ times on the vertical axis, where N ranges from 1 to 20. The graph should have two lines, one for threads (-t) and one for processes. In addition, please include a table containing the measurement times—there should be two rows (one for the thread case and one for the process case) and 20 columns (one for each value of N). You can include it as part of the graph file or the description file (see below).

Please use the average value of your measurements for each setting in this table. Please describe and explain

any nontrivial differences between the two lines;
any major trend within each line; and
the key reasons for the observed trend and behavior.

You should submit the graph as a file called graph.jpg or graph.pdf, and a text file with your descriptions and explanations as description.txt. The file description.txt should be short and concise.

10.045, 9.890

2.453, 2.424

1.117 1.077

0m0.641s, 0m0.618s

0m0.481s, 0m0.441s

0m0.368s, 0m0.334s

0m0.296s, 0m0.275s

0m0.241s, 0m0.218s

0m0.216s, 0m0.181s

0m0.191s, 0m0.164s

0m0.194s, 0m0.157s

0m0.175s, 0m0.141s

0m0.159s, 0m0.132s

0m0.148s, 0m0.122s

0m0.145s, 0m0.119s

0m0.136s, 0m0.107s

0m0.124s, 0m0.102s

0m0.123s, 0m0.096s

0m0.123s, 0m0.087s

0m0.119s, 0m0.087s

Nontrivial differences between the two lines:

The performance of threads (-t) is consistently better than that of processes, especially as $N$ increases. While both follow a similar decay curve, the thread line remains lower. This is because creating and switching between threads is lighter than processes, and threads share the same memory space, avoiding some of the overhead associated with inter-process communication (IPC) and separate memory management in the process-based approach.

Major trend within each line:

Both lines show a sharp, non-linear decrease in execution time from $N=1$ to $N=4$ (approaching a $1/N^2$ speedup initially due to the $O(n^2)$ complexity of bubble sort). As $N$ moves beyond 8-12, the rate of improvement slows down significantly (diminishing returns), eventually flattening out toward $N=20$ .

Key reasons for the observed trend:

Algorithmic Gain: Since bubble sort is $O(n^2)$ , dividing the dataset into $N$ chunks reduces the work per worker from $S^2$ to $(S/N)^2$ . This leads to a dramatic initial speedup.
Parallelization: The multi-core environment allows multiple chunks to be sorted simultaneously.
Diminishing Returns: As $N$ increases, the overhead of managing more pipes, thread/process creation, and context switching starts to offset the gains. Furthermore, the final K-way merge is a serial bottleneck ( $O(S \log N)$ ), which becomes more prominent as the individual sorting tasks become very short.

Extra credit

The following extra credits are independently of one another, and you can implement one or both of them.

Points obtained from extra credits will be counted towards your total score of the assignment (just like the points obtained from the main part above).

Abitrary input (+4pts)

Extend your sorting program to sort arbitrary lines of text, and text with more than one million lines. Each line of text may contain any arbitrary string, with any arbitrary number of characters. Note that, with this extra credit, you can no longer use static arrays for storing the data. However, you may assume that the input fits into memory – there is no need to implement external sorting – but your solution must not require an upper bound on the number of characters per line. Part of the goal for this extra credit is to give you hands-on experience in handling streams of input data without knowledge of the data size and in learning how to allocate memory only when it is needed, both of which are useful skills in practice (e.g., a mail server needs to be able to handle arbitrary-size email messages).

To maintain compatibility with the main assignment, your solution should still use numeric sorting by default, and only switch to lexicographic sorting when a special -L option is given on the command line. For instance, if the input is a file with four lines: “1”, “10”, “2”, “11”, then the output should be “1”, “2”, “10”, “11”, by default, and “1”, “10”, “11”, “2” when the -L option is given. To be more specific:

When the flag -L is used: Your program should use lexicographic sorting.
When the flag -L is not used: Your program should sort the data numerically; if the data contain text that is not a number, you should output an error message to stderr and exit. For simplicity, you may assume that the numbers are 64-bit integers when numerical sort is used.

In both cases above, your solution should support more inputs with than one million lines combined. Your program should support both threads and processes.

Shared memory (+4pts)

Extend your process-based solution to use shared memory for the communication between the processes. You should add a new command-line option (-s). If the user supplies this option, your program should use shared memory instead of pipes. We did not cover shared memory in class, but you can find many tutorials available on the web; for example, a good introduction with examples to using shared memory in C can be found at https://users.cs.cf.ac.uk/Dave.Marshall/C/node27.

For this extra-credit task, you may make the same simplifying assumptions as described in the main part of the assignment, i.e.,

the input consists of text files that contain signed 64-bit numbers, each on a separate line; and
all the input files together contain at most one million numbers.