CSE 40771 - Distributed Systems - Fall 2024

View the Project on GitHub

A3 - Persistence

Overview

Please review the general instructions for preparing assignments.

In this assignment, you will convert your (previously) in-memory spreadsheet into a persistent service by making use of the checkpointing-and-logging approach discussed in class. Again, you will measure the performance of the operations, and evaluate the cost of persistent storage against memory.

Use your prior assignment submission as the starting point, and continue with the same general code organization as before. It’s OK to add additional files and classes to the project, as long as the server main program is still called SpreadSheetServer.py and the test programs are called TestXYZ.py

Checkpoint and Transaction Log

An online storage service must write data to a persistent location (such as disk or non-volatile memory) so that it does not lose information when the server software or hardware crashes. In a production system, a server would ordinarily pass data to a database server to store it safely, but that’s just passing the buck – the database server now has the same problem of storing data persistently. Let’s solve it ourselves directly.

There are two very simple approaches that don’t scale very well. One approach would be to just write out the entire state of the spreadsheet to a file whenever any element of the table changes. That could work (if we are careful to commit atomically) but would become very expensive as the spreadsheet grows. An alternate approach would be to write the contents of each key and value into a separate file, and just update each individual file as it changes, atomically committing the file on each change. Again, this could work, but would be very expensive if the table grows to millions of entries. Each file in a filesystem requires an inode, a directory entry, and at least one data block (typically 4KB), which would become very inefficient at large scales.

A common solution is to strike a balance between the two approaches by using two data structures stored in separate files: a checkpoint and a transaction log.

Together, the checkpoint and transaction log describe the current state of the spreadsheet. To obtain the current state of the system, the server need only load the checkpoint file in its entirety, and then read the transaction log, “playing back” each entry as a series of modifications to the table in memory. Once all entries are played back, the most up-to-date state of the spreadsheet is present in memory.

Requirements

For this assignment, you will modify your server to store data persistently using a checkpoint file and transaction log. Please call your checkpoint file sheet.ckpt and your transaction log sheet.log, and proceed as follows:

Invocation

From the end user’s perspective, there should be no change in how the server and client are deployed: both clients and servers are run in the same way as before. (In fact, the client should not change at all!)

To stop the server, simply use kill or Control-C to stop the process. If you have designed your checkpoint and transaction log correctly, then you should be able to simply restart the server, and observe that the state of the spreadsheet is recovered. This mode of operation is known as “crash only” design: the server has no notion of a “clean shutdown” and is always using/testing its capability of recovering from a crash.

For this assignment, your server should compact when the transaction log has more than 100 entries. That’s artificially small, but will make it easy to test and observe that compaction is working.

Testing and Measurement

Continue to use the two test programs from the prior assignment:

And create a third test program:

What to Turn In

Please review the general instructions for submitting assignments.

Turn in all of your source code, along with a lab report titled REPORT that describes the following in detail: