CSE 40771 - Distributed Systems

CSE 40771 - Distributed Systems - Spring 2023

View the Project on GitHub

Course Project for Distributed Systems

Requirements

The final project in this course will be open ended. You will propose, carry out, and report upon a project in groups of two students. Projects must have the following elements:

Project Ideas

Following are some ideas to get you thinking. You are welcome to modify one of these ideas, or to pick a different problem, but talk to Prof. Thain first if you have something radically different in mind. One rule: please no cryptocurrency implementations! These end up consuming a lot of resources to no productive end.

Peer to Peer Communication Network - Build a system that allows for social interaction (live chat, or news updates, or social networking) between many users, without requiring any centralized server. Each participant in the system should have their own independent node that passes appropriate messages back and forth. Decide how to discover other nodes, pass messages to the rest of the system, put messages in a suitable order, and how to deal with disconnections and outages. Measure the latency and throughput of message dissemination as the system grows in size.

Scalable Filesystem - Build a file storage system that can scale up to a large number of nodes. One approach to this is to create a single “name” node and multiple “storage” nodes. The name node can keep track of the filesystem tree, file names, and the location of the files on the storage nodes. To access a file, a client must interact with the name node to locate it, and then the storage node to access it. Consider how to choose where to put files, where to replicate files, and how to deal with node outages. Measure both the latency and performance of the system as the number of clients and servers increases.

Log Oriented Replication - Build an asynchronous chain-replicated version of your distributed hash table project. Instead of having each node wait for the next node to respond (which would make the system very slow), make use of the transaction log. As a single node applies changes to the transaction log, arrange for those log entries to be passed on to the next system in a row. Careful: When is it safe for one node to compress its own log? Consider how a new node joins the system, and what happens when a node crashes. Evaluate the throughput and latency of the system as the number of clients and servers increases.

Distributed Interactive Game - Create a simple multi-player interactive game, where a centralized server manages the game and players must interact from client nodes. The game could be an old-school text adventure (explore rooms, collect items, etc) or something more graphical if you are so inclined. Design into the game some actions that are mutually exclusive, requiring the server to perform a total ordering on the actions and return the results to the client. For example, if user A roasts a marshmallow on a fire, while user B concurrently puts the fire out with water, what’s the outcome?

Peer-to-Peer Hash Table - Implement a distributed hash table in a circular peer-to-peer fashion, following the design of Chord, as described in the textbook. Note that most of the description focuses on how to find items in the system. This is important, of course, but you must also consider how nodes enter and leave the system, and arrange for data to be copied and preserved.

Distributed Data Preservation System - Design a peer-to-peer preservation system. Suppose that a user wishes to preserve some document forever. If the user delivers it to any one node in the system, then that node should take active steps to communicate with other known peers to make further copies of the document. Be careful to ensure that the system can recover from the loss of multiple nodes, but is also not causing continuous unnecessary network traffic. You may borrow ideas from other systems such as Gnutella Freenet, or Chord, provided that you implement the system yourself. Start by reviewing section 4.5.2 in the textbook.

Reliable Data Delivery System - Build a chained message queueing system like that described in section 4.3. It should consist of a message forwarding process that reads files from a directory and then delivers them to an identical process running on another machine, which stores them in a directory where could be read by another forwarding process. Make sure that your system can handle expected failure modes such as server crashes, network outages, full disks. Evaluate the throughput of your system on a large amount of data.

Distributed Board Game Engine - Pick a common board game – Chess, Othello, Go, etc – that has many configurations and is thus computationally difficult for a computer to play effectively. Take an existing solver for this game that works on a single node, and build a distributed system that can run it on multiple nodes at the same time, playing against a human user. In this way, the difficulty of the “computer” player can be expanded to as many nodes as desired. Measure the time to reach a solution of a given quality, and the amount of the configuration space that can be explored in a fixed amount of time.

Milestones

Project Proposal - Friday, March 10th - Turn in a document that describes the overall shape of your project. This should include the project partners, a high level description of the goals and structure of the system, identification of the key distributed systems problem in the system, what languages and resources will be necessary to carry it out, and your plan for evaluating the system. Be sure to think about what metrics you will use to evaluate the system – throughput, latency, scalability, runtime – and sketch a notional graph of how you think that metric will change as the system size or load increases. (Of course, I don’t expect you to know the actual results, but I want you to think about what you will measure and what orders of magnitude to expect.) The proposal should be about two full pages of text. The instructor will follow up with you to make sure that the project is of appropriate size and difficulty. Submit this via the Assignments tab in Canvas.

Progess Report - Wednesday, April 5th - Turn in a detailed report describing the overall design of your system and your progress towards building it. This will be a substantial report of some length that will require you to think carefully about the details of the system design before writing every bit of code. Your report should have the following sections:

Class Presentation

Final Submission - Wednesday, May 3rd at 5PM - Turn in your code and the final paper. The code should be structured such that the instructor can build and execute it independently. The paper should give an overview of the goal or the problem, a detailed description of the structure of your system, including a good diagram where appropriate, and an evaluation of the correctness and performance of the system. You can and should include material from your progress report regarding the architecture of the system, but of course the material should be updated and extended substantially. There is no specific length requirement; the paper should be long enough to explain all of the necessary details. The said, anything less than five pages is much too short; anything longer than fifteen pages is probably too long.

Submit your final report as a PDF via the Assignments tab in Canvas. The code for your final project should be checked into a (private) github repository that is shared with dthain and tphung3. Clearly indicate the URL of your repository on the first page of your report.