Research
In the Cooperative Computing Lab,
my team creates software
that enables the construction of applications that easily run on
thousands of computers drawn from clusters, clouds, and grids.
Our work is highly applied: we design new software systems, run them at
large scale with real applications, and then study the problems that
naturally arise in that context. As a result, we work on a wide variety
of topics, include workflows, filesystems, data management, fault tolerance, and resource management,
My work has three interwoven components:
1. Open Source Software.
We design and build the production-quality open source software that is used around the world.
The
Cooperative Computing Tools
include Work Queue, a distributed application framework;
Makeflow, a workflow system for large clusters;
Parrot, a global filesystem for remote data access, and other components as well.
Because our software must operate in many complex and hostile environments,
we maintain a rigorous software engineering process encompassing design, implementation,
testing, and complete documentation.
(see more software)
2. Research Collaborations.
We work closely with colleagues in other fields such as high energy physics,
molecular dynamics, machine learning, and computational agriculture to carefully
understand the computational needs, design novel solutions, and deploy them at scale
in infrastructure spanning airborne mini-clusters, campus infrastructure, and
leadership-class HPC machines. These collaborations are essential to identifying
the right problems, keeping our solutions on track, and ensuring that we solve
all aspects of the problem, whether usability, reliability, or performance.
3. Fundamental Research.
We perform fundamental computer science research that is grounded in working
software deployed to solve real problems. Our software and end users serve
as a "living lab" in which we can evaluate new techniques for making applications
more usable, more reliable, or more performant. For some highlights, see:
Ph.D. Graduates
I am thankful and proud to have worked with the following Ph.D. students at Notre Dame:
- Dr. Tim Shaffer, (2022) Engineer at Seagate.
- Dr. Nathaniel Kremer-Herman, (2021) Faculty at Seattle University.
- Dr. Nicholas Hazekamp, (2019) Engineer at Atomic Object.
- Dr. Chao (Charles) Zheng, (2019) Engineer at Alibaba.
- Dr. Peter Ivie, (2018) Engineer at VidAngel.
- Dr. James Sweet, (2017) postdoc at ND Center for Research Computing.
- Dr. Haiyan Meng, 2017) Engineer at Google.
- Dr. Patrick Donnelly, (2016) Engineer at Google.
- Dr. Haipeng Cai, (2015) Postdoc at Virginia Tech, then faculty at Washington State University.
- Dr. Peter Sempolinski, (2015) Postdoc at Notre Dame
- Dr. Dinesh Rajan, (2015) Engineer at Amazon.
- Dr. Li Yu, (2013) Engineer at Bloomberg.
- Dr. Peter Bui, (2012) Faculty at University of Wisconsin - Eau Claire, then Faculty at Notre Dame.
- Dr. Hoang Bui, (2012) Faculty at Wester Illinois University.
- Dr. Christopher Moretti, (2010) Faculty at Princeton University.
- Dr. Kyle Wheeler, (2009) Researcher at Sandia National Labs.
- Dr. Jeffrey Hemmes, (2009) Faculty at Air Force Institute of Technology, then Faculty at Auburn University.
Research Funding
- Coupling Sensor Networks and HPC Facilities with Advanced Wireless Networks for Near Real-Time Simulation of Digital Agriculture, PIs: Shantenu Jha and Ozgur Kilic (Brookhaven National Lab), Rich Wolski (University of California - Santa Barbara), Douglas Thain (University of Notre Dame), and Mehmet Can Vuran (University of Nebraska-Lincoln), Department of Energy, 2024-2026.
- Collaborative Research: CSSI Frameworks: From Notebook to Workflow and Back Again, PIs: Douglas Thain, Kevin Lannon, Tanu Malik, and Shaowen Wang. National Science Foundation, 2019-2023.
- CSR: Small: Accelerating Data Intensive Scientific Workflows with Consistency Contracts, PI: Douglas Thain, National Science Foundation, 2023-2026.
- POSE: Phase I: HARMONY: Harmonizing the High Performance Python Workflow Ecosystema>, PIs: Douglas Thain, Shantenu Jha, and Kyle Chard. National Science Foundation, 2024.
- CSSI Elements: DataSwarm: A User-Level Framework for Data Intensive Scientific Applications, PI: Douglas Thain, National Science Foundation, 2019-2023.
- VC3: Virtual Clusters for Community Computation, Douglas Thain, Robert Gardner, and John Hover, Department of Energy, 2016-2019.
- SI2-SSE: Scaling up Science with the Cooperative Computing Tools, Douglas Thain, National Science Foundation, April 2016-2019.
- SI2-SSE: Connecting Cyberinfrastructure with the Cooperative Computing Tools, Douglas Thain, National Science Foundation, April 2012-2015.
- dV/dt: Acclerating the Rate of Progress Toward Extreme Scale Collaborative Science, Miron Livny, Ewa Deelman, Douglas Thain, William Allcock, and Frank Wuerthwein, Department of Energy, September 2012-2015.
- REU Site: Data Intensive Scientific Computing, Douglas Thain and Kevin Lannon, National Science Foundation, February 2016-2019.
- DASPOS: Data and Software Preservation for Open Science,Michael Hildreth, Jaroslaw Nabrzyski, Mark Neubauer, Douglas Thain, and Robert Gardner, National Science Foundation, August 2012-2015.
- CDI-Type II: Open Sourcing the Design of Civil Infrastructure, Tracy Kijewski-Correa, Ahsan Kareem, Gregory Madey, Douglas Thain, August 2009-2013.
- CRI: Distributed Research Testbed (DiRT), Douglas Thain, National Science Foundation, August 2009-2012.
- "CSR-AES: Troubleshooting Large Scale Computing Grids with Machine Learning Techniques", Nitesh Chawla, Xiaohui Song, Shaowen Wang, and Douglas Thain, National Science Foundation, August 2007-2008.
- "The Notre Dame Extended Research Community", Mitchell Wayne, Thomas Loughran, Douglas Thain, Daniel Karmgard, Anna Goussiou, National Science Foundation, GK-12 Program, Sep 2007-2012.
- "CAREER: Data Intensive Grid Computing on Active Storage Clusters", Douglas Thain, National Science Foundation, Faculty Early Career Development Program, May 2007-2012.
- "HECURA: Deconstructing Clusters for High End Biometric Applications", Douglas Thain and Patrick Flynn, National Science Foundation, High End Computing University Research Activity, March 2007-2009.
- "SGER: Enabling Electronic Self-Defense with Dynamic Identities", Douglas Thain, National Science Foundation, Cybertrust Program, September 2005-2007.
- "An Experimental Approach to Integrative Research for Sensor-Rich Collaborative Teams", Christian Poellabauer, Nitesh Chawla, and Douglas Thain, Department of Defense, Defense University Research Instrumentation Program, April 2006-2007.
Our work has been generously supported by the U.S. National Science Foundation (NSF),
the U.S. Department of Energy (DOE) Office of Science, the National Aeronautics and Space Administration (NASA), and the Department of Defense University R\
esearch Instrumentation Program (DURIP).