CSE 40771 - Distributed Systems - Fall 2024

View the Project on GitHub

CSE 40771 - Distributed Systems - Fall 2024

Course Web Page

dthain.github.io/distsys-fa24

Overview

A distributed system is any computer system consisting of multiple machines that work together on a common problem. Distributed systems appear in many areas of computing, including cloud computing, mobile computing, edge computing, the internet of things, aerospace systems, and more. Distributed systems have been both interesting and difficult to build because their components may be autonomous and highly failure-prone. Students will learn the fundamental principles of distributed systems, study examples of current distributed systems, and build their own distributed systems from scratch. Topics include concurrency, fault tolerance, replication, consistency, agreement. Students will undertake a final project that involves building and evaluating a custom distributed system. Grading will be based on assignments, exams, and a final project.

This will be a fun and challenging class for students who like to build working software systems. Distributed systems bring together some very practical aspects of software engineering (e.g. like how to handle a network disconnection) and the fundamental principles of computers (e.g. whether a partitioned system can reach agreement.) The skills that you learn here will apply directly to advanced systems used in industry.

The theoretical aspects of distributed systems will be studied via the course textbook, Marten van Steen and Andrew Tanenbaum, Distributed Systems 4th edition, 2024. You can order a physical copy of this book, or register online to download a PDF, as you prefer.

Prerequisites

Course Outcomes

Students successfully completing this course will be able to:
  • Describe the architecture and operation of a variety of common distributed systems.
    Presented in course readings, evaluated in exams.
  • Compare the architecture and operation of various distributed systems.
    Presented in class discussions, evaluated in exams.
  • Describe how distributed systems are fundamentally different from standalone systems in matters such as naming, invocation, synchronization, and fault tolerance.
    Presented in course readings, practiced in assignments, evaluated in exams.
  • Construct, test, and evaluate programs in a distributed environment.
    Presented, practiced, and evaluated in the practical assignments.
  • Communicate technical results orally and in writing.
    Practiced in assignment writeups and final project talk.
  • Programming Assignments

    Six programming assignments are required, due one per week for the first half of the semester. The assignments together build towards an implementation of a scalable interactive spreadsheet that could run in a cloud service or as a peer-to-peer system:

    1. Measuring Fundamentals - Precisely measure the cost of fundamental operations in the system: function call, hash table read/write, network packet, file system I/O, process creation.
    2. Remote Procedure Call - Build a system in Python for performing remote procedure call between processes. Carefully measure the performance and throughput of this system with multiple clients.
    3. Persistence - Make the prior system persistent by implementing logging, recovery, and periodic log compression. Measure the performance of the system, observing outliers.
    4. Naming - Improve the prior system by making it more discoverable by a name service, and handle multiple independent tables internally.
    5. Concurrency - Improve the system again by permitting multiple simultaneous clients.
    6. Replication - Improve the prior system by dividing up the storage space among multiple servers, allowing for multiple clients to be served simultaneously. Measure the performance and scalability.

    What you may find different about these assignments is that that require not just writing the code to get a correct result, but also some quantitative evaluation to evaluate the underlying performance. Each will require a short text writeup explaining your measurements and their significance, like a physics lab report.

    Final Project

    In the final project, students will propose, build, and measure a distributed system of their own design, which must make use of multiple techniques discussed in class to achieve a system that is robust and performant. Examples might include a distributed filesystem, a parallel programming model, or a peer-to-peer chat system. The final submission will include a project report describing the design of the system.

    How to Get the Most Out of Class

    Read the assigned chapters in the textbook each weekend to prepare for class. The book does a good job of giving an overview of the many approaches and designs for distributed systems, and will give you the vocabulary to engage in class. To encourage you to read regularly, a summary of notes is due every Monday. These notes are for your benefit so you should organize them in whatever way is useful to you: make an outline, define key terms, make a sketch, whatever. We will do periodic "spot checks" that you are taking some reasonable kind of notes, but will not be grading them in detail.

    During most class sessions, I will present one key concept in depth with a prepared talk for about 30 minutes, and then we will shift into Q&A or working out an example. Because much of the class material involves working with system diagrams and communication patterns, I will mostly work on the blackboard instead of presenting slide decks. I strongly recommend that you take notes by sketching along with pen and paper: the simple act of note-taking exercises your mental muscles in a way that passive observation does not. [1]

    If you prefer to take notes on your laptop or tablet, that's fine. However, I do ask that you refrain from using your devices for non-class related tasks during class time. I know it is tempting during a brief lull to respond to messages, check the news, etc, but even one laptop open can be an unavoidable distraction for other people in the class. [2] Please reserve this time for working together.

    Communications

    The course web page will provide all the course schedules, assignments, and materials. Canvas will be used to mainly distribute grades, but that's about it.

    Slack (#distsys-fa24) will be used to handle general Q&A for the class. If you have a technical question that could be of interest to others, please post it there, so that others can benefit from the answers. You are welcome to post (or answer) questions anytime, and we will generally monitor and answer questions on weekday afternoons. (Keep in mind that we do go home at night, and so late-night questions will get answered the next day.)

    For questions about grades or anything else that just applies specifically to you, just email the instructor or TA directly.

    Office hours are a great time to get focused help on a tricky bit of code. We are happy to help you during those times. Just knock, come in, and introduce yourself. However, don't "camp out" in office hours just to do your homework: come when you have a specific question. If you can't make any of the office hours, then send email to see if we can work out another time.

    Assignments and Grading

    Programming assignments are generally due Thursday nights. (You will have an automatic grace period until 9:00AM on Friday mornings, but we don't realy mean for you to stay up all night.) Because the programming assignments are cumulative, working up to a larger goal, it's important to stay on top of things and make progress every week -- don't leave the assignment until the last minute.

    Programming assignments will be submitted by copying files to a "dropbox" directory on the student machines. You are free to submit (or resubmit) anytime, so it would be a good idea to submit something (even if incomplete) well before the due date. Writing assignments (weekly readings, project proposal, etc) should be submitted via the Canvas assignments feature.

    Graded assignments will be returned by creating a GRADE file in the dropbox directory with detailed feedback. The numeric grade will be entered into Canvas.

    If you believe that there has been an error in grading, please first contact the TA who graded your work to discuss. We do make mistakes sometimes, and objective errors will be cheerfully corrected. If you aren't satisfied after talking to the TA, then go ahead and contact Prof. Thain. Any concerns about grading must be raised within seven days of receiving the graded item. After that, grades are final.

    Semester grades will be weighted as follows:

    CategoryWeight
    Reading Notes 5%
    Programming Assignments 35%
    Course Project 20%
    Midterm Exams (2) 20%
    Final Exam 20%

    Late Work and Excused Absences

    Late assignments receive no credit, so get started early and submit your work on time. It's better to submit something imperfect on time than to submit nothing at all. If you get busy and miss something, then let it go, and set your sights on the next assignment.

    Exceptions will be made only for the circumstances given in section 3.1.3 of the undergraduate academic code, such as major illness, death in the family, or participation in an authorized university event. In those cases, the student should confer with the instructor as soon as the conflict is known, and we will work out an alternate schedule.

    Academic Code of Honor

    Notre Dame students are expected to abide by the Academic Code of Honor Pledge:

    "As a member of the Notre Dame community, I acknowledge that it is my responsibility to learn and abide by principles of intellectual honesty and academic integrity, and therefore I will not participate in or tolerate academic dishonesty."

    The purpose of all the assignments is to flex your own mental muscles to develop your own thoughts, understanding, and skills. To that end, programming assignments, reading assignments, and exams are to be completed individually, unless permission otherwise has been given. The final project may be completed in small groups.

    All software development, experimental work, and writing of results must be by your own hands, in your own words. You are encouraged to seek out and consult reference manuals, books, websites, and other documentation that will help you to complete each programming assignment, provided that you indicate what sources you have used. However, the result of such consultation should be an understanding of the material so that you can do the work yourself. For the same reasons, the use of AI assistants such as ChatGPT, Copilot (and similar tools) to generate code or text is forbidden. (And frankly, likely to produce things that just won't work right.)

    Some Campus Resources

    If you require an accommodation for a disability, please first contact the Sara Bea Center (sarabeadisabilityservices.nd.edu) for a consultation, and we will be happy to work together on a solution.

    If you encounter a difficult life situation and don't know what to do, the University Counseling Center (ucc.nd.edu) or the Care and Wellness Consultants (care.nd.edu) can help and also connect you with other campus resources.

    References