Award Winner: Aditya Ramamoorthy
Award Title: CIF:Small: Towards practical gradient coding
Award Category: Project Funding Award
Award Description: Machine learning systems have made revolutionary advances in several areas, including (but not limited to) automated speech and image recognition, scientific discovery, human health and national security. These advances have been made possible in large part by the training of high-capacity models that are able to capture and infer complex relationships between exorbitant amounts of data, such as images, video, and speech. Such training is quite resource-intensive and failure-prone and typically requires the deployment of large groups of computers that operate collaboratively to achieve the overall objectives. For instance, by conservative estimates, the training of current state-of-the-art models for language understanding consume enough energy to power over one thousand average US households for a year. Moreover, a rule-of-thumb within distributed computing states: “failures are the norm, rather than the exception”. This project will investigate resource-efficient and fault-tolerant schemes for distributed model training within machine learning. Specifically, the training time depends on the reliability and speed of the computers and the speed of communication between them. This project will examine techniques for simultaneously increasing both the reliability and speed of the process. If successful, this will result in significant energy and monetary savings across the board in scenarios where machine learning is routinely deployed. The ability to work with large-scale computing clusters is an essential skill for the workforce, and this project will help train undergraduate and graduate students in such techniques. The team of researchers will volunteer for mathematics tutoring activities as part of the CyMath initiative at Iowa State; CyMath offers free and open-to-all, after-school math tutoring for elementary and middle school students in Ames area schools.
Award website: Here
Co-PIs: None
Funding Source(s): NSF
Award Amount: $582,000