This is an online short course in Data Intensive Scientific Computing created by Douglas Thain and Paul Brenner in cooperation with the Office of Digital Learning at the University of Notre Dame.
Topics include cluster architecture, distributed batch systems, workflow systems, concurrent programming, filesystems, and networks. DISC was initially developed for undergraduate students participating in a summer research experience, but is suitable for any undergraduate or graduate student interested in an introduction to large scale scientific computing. The course load is approximately that of a one credit research seminar.
To take the course and record your participation, sign up via EdX Edge . Or use the links below to browse the course content. Anyone is welcome to view the lectures and use the homeworks and tutorials either individually, or as part of a formal course offering. However, the instructors will only grade material submitted by students at the University of Notre Dame
Prof. Douglas Thain
Associate Professor
|
Prof. Paul Brenner
Associate Director
|
Prof. Brian Bockelman
|
Prof. Ewa Deelman
|
Prof. Kevin Lannon
|
Prof. Nirav Merchant
|
Prof. Michela Taufer
|
Dr. Steve Tuecke
|
Prof. Frank Wuerthwein
|
Lecture 1: Introduction to Data Intensive Scientific Computing
Introduction to DISC, applications in high energy physics and bioinformatics, overview of cluster architecture, challenges of cluster computing, course outline, and guest lecture from Prof. Nirav Merchant. |
|
Lecture 2: Distributed Batch Systems
Purpose and function of distributed batch systems in general, gives a brief introduction to the HTCondor batch system, best practices for working with batch systems, and guest lecture by Prof. Brian Bockelman. |
|
Tutorial A: HTCondor Distributed Batch System
|
|
Lecture 3: Performance Evaluation
Performance evaluation of batch workloads, including job lifetimes, the long tail effect, Amdahl's law, limits to scaling up, and guest lecture by Prof. Michela Taufer. |
|
Lecture 4: Workflow Systems
Principles of workflow management systems, tutorial on the Makeflow workflow system, survey of additional systems including DAGMan, Pegasus, Kepler, Swift, and CWL, performance evaluation of workflows, pilot job systems, and guest lecture by Prof. Ewa Deelman. |
|
Tutorial B: The Makeflow Workflow System
|
|
Lecture 5: Concurrent Programming
Applications of highly concurrent programming models, introduction to the Work Queue software, examples of using the Work Queue API, running large programs at scale, and guest lecture by Prof. Kevin Lannon. |
|
Tutorial C: Concurrent Programming with Work Queue
|
|
Lecture 6: Networks and Data Movement
Introduction to local area networks, wide area networks, performance evaluation, firewalls and security, and cloud computing. Guest lecture by Dr. Steve, Tuecke, University of Chicago. |
|
Lecture 7: Storage and Filesystems
Storage hardware, local filesystems, distributed filesystems such as NFS and AFS, parallel filesystems such as Lustre, Panasas, and Ceph, and cloud storage systems. Guest lecture by Prof. Frank Wuerthwein of USCD. |
|
Tutorial D: File Systems and Data Movement
|