In today’s world, huge amounts of audio-visual data are collected on a daily basis by a variety of electronic devices. A lot of this is distributed data, such as data from surveillance cameras, “internet of things” (IoT) devices and smartphones. All of this data needs to be stored somewhere, but the size and amount of data is too much to keep as is.
Data needs to be compressed (sketched) before it can be stored at a smaller size — however, when the data is distributed and needs to be compressed in a privacy-preserving and secure fashion while also keeping the sketching algorithm very simple, the design of the overall system (the sketching and the corresponding decompression or recovery approaches), is not straightforward.
Professors Namrata Vaswani and Aditya Ramamoorthy, from Iowa State University‘s Department of Electrical and Computer Engineering, recently received a $564,500 grant from the National Science Foundation (NSF) to solve this type of problem.
The central idea of performing federated (distributed and privacy-preserving) sketching is for the different nodes to compute random “linear projections” of their own signal/image. In a very different class of applications — projection imaging settings like magnetic resonance imaging (MRI), computed tomography (CT) or Fourier ptychography — linear or quadratic projections of the image are the only quantities that can be measured. Moreover, these are measured one projection at a time, making data acquisition very slow. For both federated sketching and projection imaging, there is a need to be able to reconstruct the image sequence using as few projections as possible. In sketching this implies better compression, where as in projection imaging this means that the scan-time per image can be reduced (resulting in accelerated imaging). MRI and CT are frequently used in non-invasive medical diagnostics while ptychography is useful for high resolution microscopy.
“In non-invasive imaging settings like dynamic MRI — for example, the imaging of a cross-section of a beating heart or of the brain as a person responds to external stimuli — the image of the heart or brain is not captured directly,” Vaswani said. “Instead, the MRI scanner measures one linear projection of the cross-section at a time. The only way to reconstruct a high-fidelity image sequence or video (or other signals) from few measurements is to exploit the structural similarities within a given image, as well as across a sequence of similar images. ”
In many cases, individual nodes would like to keep their own data private and only release summaries of them for the overall processing. In current jargon, distributed and iterative processing of information that works by keeping individual data private, but sharing only data summaries from each node, is referred to as “federated learning.”
According to Vaswani: “Our work designs efficient, secure and federated algorithms for accurate recovery from either highly compressed data (first application), or from highly under-sampled MRI measurements (accelerated MR imaging), and provides the first set of conditions under which a certain computational and sample budget is guaranteed to be sufficient with high probability. To achieve this, our work will exploit what is called the low-rank (LR) assumption — the set of images, each arranged as a 1D vector, form a matrix that has a rank much lower than its dimensions. Extensions that exploit the fact that occasional large deviations of the matrix from LR structure can be modeled as approximately sparse matrices will also be developed.”
The LR assumption has been extensively used in other very different settings: most commonly, LR matrix sensing and completion — these key sub-problems that occur in the design of reliable automated recommendation systems.
Vaswani further stated that since the early work on compressed sensing, sparsity assumptions have been exploited very fruitfully for both efficient distributed compression and accelerated dynamic MRI. However, there is limited literature on the use of the LR assumption, and almost none that provides a theoretical guarantee on roughly how many MRI measurements (or data sketches) are sufficient to do this while also making sure that the recovery algorithm is fast. Accelerated imaging along with a fast recovery algorithm are the two components needed to make real-time dynamic MRI for applications like MRI-guided surgery practical.
The distributed aspect of the problem presents a second crucial challenge of being robust to secure against attacks by malicious users and malfunctioning nodes. This is where Ramamoorthy’s expertise becomes critical.
“It is possible that some worker nodes are either malicious or malfunctioning and do not provide accurate information in the data processing step(s),’’ Ramamoorthy said. “In distributed systems, it is very important to ensure that the information processing remains resilient even in the face of unexpected situations, such as malfunctioning or malicious nodes. These can happen because of inevitable component failures, or in certain scenarios, deliberate attacks by adversarial agents who aim at disrupting the functioning of these systems. For instance, a worker node could promise a certain linear projection, but instead provide something very different.’’
Vaswani and Ramamoorthy will collaborate on innovative approaches for mitigating the effect of adversarial attacks on these systems while ensuring that the fidelity of the recovered information is maintained. As noted by them, approaches that respect data privacy of individual nodes and are guaranteed to be secure from online attacks do not exist for the type of problems described above.
This grant also partially supports the new CyMath-Kids initiative, which is run in partnership with the ISU 4U Promise program. CyMath is looking for volunteers (graduate students in ECpE, math or computer science) to serve as tutors; recent graduates are welcome, too. Time commitment is about two hours per week during the school year. See this link for more details.