DISC - Data Intensive Scientific Computing - Online Course

This is an online short course in Data Intensive Scientific Computing created by Douglas Thain and Paul Brenner in cooperation with the Office of Digital Learning at the University of Notre Dame.

Start DISC on EdX Edge

Topics include cluster architecture, distributed batch systems, workflow systems, concurrent programming, filesystems, and networks. DISC was initially developed for undergraduate students participating in a summer research experience, but is suitable for any undergraduate or graduate student interested in an introduction to large scale scientific computing. The course load is approximately that of a one credit research seminar.

To take the course and record your participation, sign up via EdX Edge . Or use the links below to browse the course content. Anyone is welcome to view the lectures and use the homeworks and tutorials either individually, or as part of a formal course offering. However, the instructors will only grade material submitted by students at the University of Notre Dame

Instructors

Prof. Douglas Thain

Associate Professor
Department of Computer Science and Engineering
University of Notre Dame

Prof. Paul Brenner

Associate Director
Center for Research Computing
University of Notre Dame

Guest Speakers

Prof. Brian Bockelman
Research Assistant Professor
University of Nebraska-Lincoln

Prof. Ewa Deelman
Research Director
USC Information Sciences Institute

Prof. Kevin Lannon
Associate Professor
University of Notre Dame

Prof. Nirav Merchant
Research Director
University of Arizona

Prof. Michela Taufer
Associate Professor
University of Delaware

Dr. Steve Tuecke
Globus Project
University of Chicago

Prof. Frank Wuerthwein
Professor
University of California - San Diego

Course Outline

Lecture 1: Introduction to Data Intensive Scientific Computing

Introduction to DISC, applications in high energy physics and bioinformatics, overview of cluster architecture, challenges of cluster computing, course outline, and guest lecture from Prof. Nirav Merchant.

Lecture 2: Distributed Batch Systems

Purpose and function of distributed batch systems in general, gives a brief introduction to the HTCondor batch system, best practices for working with batch systems, and guest lecture by Prof. Brian Bockelman.

Tutorial A: HTCondor Distributed Batch System

Lecture 3: Performance Evaluation

Performance evaluation of batch workloads, including job lifetimes, the long tail effect, Amdahl's law, limits to scaling up, and guest lecture by Prof. Michela Taufer.

Lecture 4: Workflow Systems

Principles of workflow management systems, tutorial on the Makeflow workflow system, survey of additional systems including DAGMan, Pegasus, Kepler, Swift, and CWL, performance evaluation of workflows, pilot job systems, and guest lecture by Prof. Ewa Deelman.

Tutorial B: The Makeflow Workflow System

Lecture 5: Concurrent Programming

Applications of highly concurrent programming models, introduction to the Work Queue software, examples of using the Work Queue API, running large programs at scale, and guest lecture by Prof. Kevin Lannon.

Tutorial C: Concurrent Programming with Work Queue

Lecture 6: Networks and Data Movement

Introduction to local area networks, wide area networks, performance evaluation, firewalls and security, and cloud computing. Guest lecture by Dr. Steve, Tuecke, University of Chicago.

Lecture 7: Storage and Filesystems

Storage hardware, local filesystems, distributed filesystems such as NFS and AFS, parallel filesystems such as Lustre, Panasas, and Ceph, and cloud storage systems. Guest lecture by Prof. Frank Wuerthwein of USCD.

Tutorial D: File Systems and Data Movement