Operating Systems Spring 2024
The goals of this project are:
Consider what happens when you have a large amount of work to accomplish, but you cannot simply run them everything at once: downloading lots of large files, sending models to a 3D printer, rendering lots of frames for a movie, processing credit card payments, etc. To manage a large amount of work, you need a job scheduler which will accept submissions, arrange for them to execute, and report back information to ther user. In this project, you will build a job schedule for text-to-speech conversion.
For this project, we will be making use of an existing text to speech system known as Piper. Briefly, Piper reads in a plain text file and generates output in the form of a WAV file containing a voice reading the text. (It does a variety of other things too.)
You will need to do some setup to get piper working in your home directory. It will be easiest to put it inside of the working directory for your project, like this:
cd $HOME
mkdir opsys-project4
cd opsys-project4
curl -L https://github.com/rhasspy/piper/releases/download/v1.2.0/piper_amd64.tar.gz > piper.tar.gz
tar xvzf piper.tar.gz
Then, you will need to install a voice from the voice library.
These instructions install the arctic
voice, but you can pick another
one if you like.
curl -L https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/arctic/medium/en_US-arctic-medium.onnx > arctic.onnx
curl -L https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/arctic/medium/en_US-arctic-medium.onnx.json > arctic.onnx.json
Now, in order to have some text to convert, go to Project Gutenberg
and copy-paste a paragraph from 5-10 different books, each one into a different file. For example, suppose
you have a file called alice.txt
containing:
Alice was beginning to get very tired of sitting by her sister on the
bank, and of having nothing to do: once or twice she had peeped into
the book her sister was reading, but it had no pictures or
conversations in it, “and what is the use of a book,” thought Alice
“without pictures or conversations?”
Use Piper to convert this into a WAV file reading it out load:
piper/piper -f alice.wav -m arctic.onnx < alice.txt
Download that file and play it back. Neat, eh?
If that’s working, you can delete the downloaded tarball:
rm piper.tar.gz
Now, two things you should notice about running Piper are (a) it takes a non-trivial amount of time to generate the output, and (b) the output is much larger than the input text. If you were running a service for performing text-to-speech conversion, it would not be a good idea to simply run every request simultaneously. Instead, it would be better to establish a queue of jobs to execute, and then carry them out in an orderly manner.
In this project, you will build a program called jobsched for managing a large number of text to speech jobs. The basic idea is that the user of the system will submit requests to convert text to speech. jobsched will accept those requests, organize them into a job queue, and then carry them out as resources are available. If there are a very large number of requests to execute, then only a few will be executed at a time, if there is sufficient storage space available.
When started, your program should simply wait for input from the user. It should accept commands like this:
submit <file>
nthreads <n>
list
wait <jobid>
waitall
delete <jobid>
schedule <fifo|sjf|balanced>
quit
help
The submit
command defines a new text-to-speech job, and names the
input text file to convert.
submit
should not perform the conversion itself! Instead, submit
should add the job to the queue,
and display a unique integer job ID generated internally by your program.
(Just start at one and count up.) The job will then run in the background
when selected by the scheduler. When done, it should produce an output file
called jobN.wav
, regardless of the name of the input file.
So, Job 1 will produce job1.wav
, Job 2 will produce job2.wav
, etc.
The nthreads
command should start n
background threads that perform
text-to-speech tasks on the submitted jobs. The nthreads
command may only be given once in
any session. If it is given a second time, it should fail.
The list
command lists all of the jobs currently known,
giving the job id, current state (WAITING, RUNNING, or DONE), input filename,
size of the input file, and size of the output file (if DONE).
It should also display the total size of all input files,
the total size of all output files (for DONE jobs),
the average turnaround time (of DONE jobs), and average response time (of DONE jobs.)
You can format this output in any way that is consistent and easy to read.
The wait
command takes a jobid and pauses until that job is
done running. Once complete, it should display the final status
of the job (success or failure) and the time at which it was submitted,
started running, and completed. (If the job was already complete,
then it should just display the relevant information immediately.)
The waitall
command should block until all jobs in the queue
are in the DONE state.
The delete
command takes a jobid and then removes the job
from the queue, along with its output file. However, a job
cannot be deleted while it is in the RUNNING state.
In this case, display a suitable error and
refuse to delete the job.
The schedule
command should select the scheduling algorithms
used: fcfs
is first-come-first-served,
and sjf
is shortest-job-first, and balanced
should prefer
the shortest job, but make some accomodation to ensure that
no job is starved indefinitely.
The quit
command should immediately exit the program,
regardless of any jobs in the queue. (If end-of-file is detected
on the input, the program should quit in the same way.)
The help
command should display the available commands
in a helpful manner.
A particular challenge with the text-to-speech conversion is that
the output files can be quite large. If the user submits 100 (small)
text files, the resulting sounds files may overflow the disk!
To prevent this from happening, the background threads should decline
to start new jobs whenever the total size of output files (job1.wav
, job2.wav
, etc)
of DONE jobs exceeds a total of 100MB.
Note that the delete
command removes a job as well as its output file.
So, if the system has reached the 100MB limit, the user can delete
a
few jobs to free up space, and the background threads should automatically resume work.
This project will bring together a variety of concepts that you have studied so far: process management, thread synchronization, and scheduling. We aren’t going to talk through every little function you should use; you will need to review prior material and look up documentation as needed.
Here is the basic architecture you should use:
The program should consists of several threads: one main thread and several background threads, which interact through a common job queue. The main thread interacts with the user by reading commands, submitting jobs to the queue, displaying status, and so forth. One or more background threads will carry out the actual text-to-speech tasks. Each background thread should select the next job to run (based on the current scheduling algorithm), run text-to-speech as a separate external process, and then update the job queue when that process is complete.
The tricky part of this assignment is the job queue itself.
The job queue should be implemented as a monitor
as discussed
in class: a data structure that is protected by a mutex
and a condition variable
. The job queue should
only be accessed by functions that take care to use the mutex
for mutual exclusion and the condition variable to sleep and wakeup.
Generally speaking, the main thread should remain responsive to the user by only performing quick actions on the job queue. The user ought to be able to see immediate response to all commands, except those that specifically wait for a job to complete.
Test your scheduler carefully by using it to run a variety of jobs, both short and long, in different orders. Try different combinations of operations: submitting, waiting, removing, etc. Try the different scheduling algorithms, and verify that the statistics reported are correct for the jobs that actually run. Try running with different numbers of background threads, and check that all are kept appropriately busy.
Think critically about unexpected events, such as improper input from the user, incorrect arguments to commands, and so forth. Whenever such an unexpected event occurs, you should display a detailed and helpful error message, and (where possible) continue operation of the program.
A particularly useful way to test is to create small input files
that contain a sequence of operations, for example create test1.txt
containing this:
submit frank.txt
list
nthreads 3
wait 1
list
delete 1
list
quit
Then, just run your scheduler with input redirected from that file:
./jobsched < test1.txt
Please review the general instructions for assignments.
Turn in the following:
jobsched
.jobsched
when the user types make
, and cleans up all executables and object files on make clean
.Do not turn in Piper or voice files or output wave files, since these are large and we don’t need the extra copies.
This assignment is due at 11:59PM on Wednesday, March 27th
To submit your files, just copy them into the project4
subdirectory of your dropbox.
Your grade will be based on: