You can think of the world around us in terms of massive quantities of data—endless connections in social network graphs, biological networks, economic systems, and so on. “All the data is sort of interconnected in some way. The data is in different forms, [but] if you have data that is all linked together, virtually because these links may not physically exist, the question is can you then discover interesting patterns?”
This is the research area of Professor of Computer Science Mohammed Zaki, who has been selected to be one of the 2010 Hewlett Packard Labs Innovation Research awardees for his project titled “Mining and Querying Massive and Complex Graph Data.”
His research is focused on studying more efficient algorithms to carry out search queries in these large and complex datasets. His research impacts many scientific domains that rely on or can benefit from improved information management techniques.
With large amounts of data and complex network connections, data mining allows people to perform vague queries in order to search for interesting associations in the data. This would be different from a regular keyword search, where you know what you are looking for.
“Data mining is much more advanced in the sense that I actually do not know what I’m looking for. I just say, this is my dataset, find something interesting.”
Because of this characteristic, data mining can help scientists formulate hypotheses based on patterns and relationships discovered in data. One of the areas Zaki has applied his research to is bioinformatics. “I’d talk to a biology professor and say, ‘Look, I found this thing, does it make sense to you? Can you do some follow-up experiments to verify that?’”
Zaki first came to RPI in 1998 and over the past 10 years has worked with his lab studying high-performance data mining and indexing (storage) for large complex datasets. Their research involves looking at existing datasets and studying how to store data for fast information recovery, and create efficient algorithms for searching and extracting information using queries.
Zaki’s lab has also released software, The Data Mining Template Library, which is available on his website. He hopes to eventually add features to the software, incorporating more advanced data mining methods.
“We’re constantly looking for better ways of doing things, better ways of discovering these relationships, and trying to link the different types of information,” said Zaki.
The HP Labs Innovation Research Award gives a one-year cash award to researchers “based on their alignment with the chosen research topic and expected impact of the proposed research,” according to the HP Labs website, to help cover research expenses. The funding can be renewed up to three years at HP’s discretion.
Once Zaki begins his research with HP, Zaki’s lab will exclusively corporate data supplied by HP to conduct their research. “Within HP there’s a lot of information about how corporate basically works and it tends to be very scattered,” stated Zaki. “The goal of this graph mining would be to actually help them discover certain interesting patterns and relationships within their corporate knowledge network.” Zaki will be meeting with a business unit in HP next month in order to discuss the next step of their research.