College of Engineering News • Iowa State University

Managing big data with efficient algorithms and software

Engineering_Dec2014_017Electrical and computer engineering professor devises technology that helps transform data into usable information.

Over the past 10 to 15 years, there has been a big change in the ecosystem of software and hardware for data processing.

Srikanta Tirthapura, associate professor of electrical and computer engineering, says this change is being driven by favorable economic trends in hardware for data storage and processing that have made it possible to collect and store massive amounts of data.

“There are surveys that say the amount of data collected is doubling every one and a half years or so,” he said. He adds that he finds this statistic incredible, but not surprising. “If we can collect more data, we will collect more data.”

Within the emerging area of ‘big data’ Tirthapura develops methods to analyze extremely large data sets, especially data that is changing quickly. “In 2009 and 2010, I spent some time working at a database company called Oracle. That was when big data was taking off, and I got inspired to follow it and see what direction it would take,” said Tirthapura. Since then, he has also worked closely with IBM research on how to process massive data streams. A part of this research focuses on designing the right structures for storing data that can work with appropriately designed algorithms that process data efficiently.

He applies his knowledge of software and algorithm development to a variety of problems, including cyber security. “One problem I’m currently investigating is called ‘insider threat detection’, where someone within an organization who has authorized access to some parts of the system misuses his or her access.”

Tirthapura is developing an approach to help find these threats through an analysis of events in the computer system. He says identifying an insider threat is like “finding a needle in a haystack” because there’s so much activity happening in a system, but that’s where his skills come in to play. “I work with the nitty-gritty of how to store this data for efficient retrieval and how to process it to find patterns, trends and anomalies.”

In another application, Tirthapura is working with colleagues at the Institute of Transportation (InTrans) in collaboration with the Iowa Department of Transportation to make data collected from roadway sensors useful and actionable.

In addition, he has made large strides with applying his fundamental research on algorithms for big data, including novel approaches to random sampling, as well as a data structuring technique for high-dimensional data called the “space-filling curve.” In joint work with students, he recently presented an analysis of the clustering properties of space-filling curves, work that solves a problem that has been open for nearly 20 years.

Tirthapura says that one challenge in the big data field is that there are so many opportunities. “It’s a very broad area, so there’s all sorts of room for innovation, and you have to choose carefully what you want to do to make a lasting impact.”

That’s why Tirthapura wants to prepare his students to navigate this growing field. To do so, he created a class called CprE 419: Software Tools for Large Scale Data Analysis. He says that while many universities have classes on the general topic of big data, the course at Iowa State is focused on specifically examining software development methods for big data, and students who have taken this course have gone on to become data scientists in the industry.

The interest of students and collaboration with other faculty has helped Tirthapura create valuable solutions. “Iowa State University is a terrific place to be since there are opportunities to work with experts in a large variety of application areas.”