Jennifer Davidson’s algorithms seek to lift the veil from crime’s darkest corner
Barely five feet and 100 pounds, Jennifer Davidson may one day be the worst nightmare of one of the worst types of criminal under law enforcement scrutiny today. As an academic—and a parent—Davidson wouldn’t have it any other way.
An unlikely crime fighter, the associate chair of Iowa State’s math department, who also holds a joint appointment in the Department of Electrical and Computer Engineering, has been working with Iowa’s Division of Criminal Investigation (DCI) to develop methods for detecting hidden content in digital image files—specifically, child pornography concealed in seemingly innocent digital photographs.
An imperceptible change
Davidson came to Iowa State in 1989 as a specialist in image processing and content analysis, with an emphasis on stochastic modeling for the detection and recognition of specific objects in digital images. In 1998, a student interested in “covert communications” asked Davidson for help with his studies. The student wrote a class paper on steganography, an emerging field Davidson was unfamiliar with at the time. Nonetheless, student and professor soon co-authored a conference paper on the subject.
Put simply, steganography (from the Greek for “hidden writing”) involves the embedding of messages within other, seemingly innocent messages. Unlike encryption, though, those embedding steganographic content in “host” messages are not satisfied simply to hide the content of the steganographic message—that’s the job of encryption—but instead wish to conceal the very existence of the hidden message in the first place.
Although concealed information can be transmitted using any digital file, steganography often involves hiding information inside image files, typically photographs. According to Davidson, this could be as simple as altering the color value of each pixel of a digital photograph by a factor of one on a tonal scale of one to 255, a change imperceptible to human vision. Given that a digital photo easily contains millions of pixels (a six-megapixel photo has six million pixels), that translates into hundreds of thousands of bytes of information that can be concealed in a single photo—up to and including an image entirely different than the one seen by the naked eye.
“Comparing an innocent image and a ‘stego’ image side by side, it’s virtually impossible to tell the difference, unless you actually look at the numbers of the color values,” Davidson observes. “If you use a good embedding software, nobody’s going to notice it; nobody’s going to look at it and say, ‘this looks funny.’”
Scanning the scope of crime
Given this focus on the numbers, it’s not surprising that a mathematician would look to statistical markers for teasing steganography from digital images. So, with support from the National Institute of Justice (NIJ) through the Midwest Forensics Research Center at Iowa State, Davidson has been developing state-of-the-art algorithms to quantify changes in the statistical properties of groups of pixel values that would indicate the presence of hidden content.
Davidson’s software does not yet have the capability of determining the nature of the embedded “message,” which may be encrypted in any event and in need of further forensic scrutiny to determine possible criminality. But that’s OK, according to Special Agent Gerard Meyers of the Iowa DCI. For now, he says, law enforcement simply needs tools to determine the scope of the problem in the first place.
“A lot of child pornographers are organized syndicates around the world,” says Meyers. “There’s a lot of communication and trading among them, and distribution is a big piece of that. So from a limited standpoint, we’d just like to run a tool against child pornography to see if we’re missing any communications.”
According to his colleagues in the FBI and other federal agencies, Meyers says, local and state police units such as his are missing a great deal, whether child pornography or other hidden communications embedded within a variety of data streams, including e-mail and plain text files. However, unlike the feds, state and local forces currently lack both the technical and financial resources to uncover much of this activity.
And, Meyers adds, it’s not just child pornography that concerns his office. “There’s an assumption that there’s a lot of terrorism communication being transmitted through graphical data files,” he notes, “and that we’re just missing it because we don’t have the tools to efficiently scan all the data.”
The need for speed
Davidson’s first foray into steganalysis met with only limited success. Undertaken with math department colleague Clifford Bergman, that 2005 effort used artificial neural networks—essentially, electronic “brains”—as pattern classifiers to develop a software tool to detect “hidden” data on a file; it did not, however, identify the type of data on the file.
“It wasn’t terribly good,” Davidson acknowledges, “but it wasn’t bad at the time. It was a proof of concept.”
Davidson’s concept was sufficiently proven that, in 2008, the NIJ awarded her funds to develop her initial work into a tool law enforcement could actually use. So, along with computer science graduate student Jaikishan Jalan, Davidson set out to refine her algorithms to work faster and in bulk, then incorporate them into a user-friendly interface that could meet the real-world needs of law-enforcement officials.
Part of the problem, Davidson notes, is that her initial effort required the operator to enter each image into the program separately and then wait while the software scanned it for anomalies that might indicate the existence of data hidden within the host image. That process would then be repeated for each image in a suspect file, each file in a folder, and all folders on a given computer. On a computer containing dozens of folders with hundreds of files and potentially thousands of images, such a project could take weeks or even months to complete—before decrypting the hidden content in a single image.
An exponential improvement
With Davidson’s work in 2009, though, all that changed. “This software will process as many images as you want to write at a time,” she says of her latest iteration. “It’s much more sophisticated.”
Where the 2005 version of the detection algorithm took as long as 60 seconds to flag a file as a probable “stego,” Davidson’s latest version can scan an image in as little as seven seconds. Even better, users now can simply enter the file path of a given folder into the program, which will scan the folder’s contents automatically, and return when the job is done to pull flagged files for further investigation.
“It’s a very simple interface, easy to use,” says Jalan, Davidson’s grad student. “Basically it’s a filter, out of the bottom of which any stego images will drop. The program will store info on the flagged images that show up in the report format.”
That enhanced speed and efficiency is especially valuable from a law-enforcement perspective. “We’re under such pressure with general investigations that we just don’t have the time to scan these files one at a time,” says the DCI’s Meyers. “In a given child exploitation case or cyber case, we may have hundreds of thousands of images we would like to assess.”
And, adds Jalan, while the new version can identify, for forensic purposes, a possible match to 5 of the 700 or so known programs used to embed stegos in a given image, the program is also capable of sniffing out embedding programs previously unknown to law enforcement.
“The mathematical framework we have in the background does not look for an exact algorithm,” Jalan notes. “If they develop their own software, it looks for certain properties of the stego algorithm. Any algorithm that changes the data of the image changes some characteristics of the image. So we look for those characteristics.”
Into the future
While Davidson’s project has met with some success to date, there are both promise and potential pitfalls in the future, should police and various industries adopt the software package. For example, currently her algorithm cannot detect smaller hidden files. So while it may be useful for rooting out child pornography, it might be of limited use with smaller data sets such as short coded messages between terrorist cells.
“There has to be enough content embedded that it changes the statistics enough for your algorithm to pick it up,” Davidson observes.
Next, although valuable, merely determining that a “host” file contains hidden content is not in itself sufficient: the illicit “payload” has to be extracted and then, in all likelihood, decrypted to be ultimately actionable by law enforcement. Coding and encryption fall outside of Davidson’s expertise. However, by identifying the specific algorithm used to embed the contraband image or message, her software might one day allow users to try and reverse the embedding process to extract the hidden payload, thus saving forensic examiners valuable time and resources.
Yet perhaps one of the most promising applications is one fraught with the greatest potential peril, not so much from a technical as a legal standpoint—namely, the more widespread use of the software as a network watchdog beyond the contents of a given computer’s hard drive.
While firms might legally install filters on company-owned machines and servers to sniff out stego-embedded files to keep confidential information and intellectual property from walking out the door, the use of such software filters on public ISPs may run afoul of the U.S. Constitution, specifically the Fourth Amendment’s protections against “unreasonable searches and seizures.”
Currently, Meyers notes, law enforcement provides ISPs with the “hashes,” or digital signatures, of known pornographic images, which, if not steganographically embedded or otherwise encrypted, can be detected and reported with legal immunity for both law enforcement and the ISP. However, in the absence of specific probable cause, it is less certain that the courts would protect ISPs from the possible legal consequences of reporting suspect Internet traffic to government agencies merely because a file traveling through its servers contained embedded steganography.
‘The bad guys are ahead’
As with other aspects of the Internet, the transmission of digital contraband in any form has law enforcement and the researchers who support their work racing to keep up with the breathtaking pace of change in the technologies and tactics wrongdoers employ to conceal their activities.
“We’re underequipped in this fight,” Meyers concedes, “and the technology is certainly something that we’re chasing. The bad guys are always one step ahead, and they always seem to be better equipped and organized because of funding channels.”
Ultimately, it is up to the public to determine public priorities, as well as the proper balance between civil liberties and the need to protect the most vulnerable members of society. Financial institutions will spend whatever it takes to protect their assets; industries will guard their intellectual property at all costs. But whether society will commit the resources to protect our most precious asset—our children—from the worst of predators remains to be seen.
For more information about protecting children from sexual exploitation and abuse, please visit: