Blog: The Cutting Edge
As the volume and complexity of data available to researchers increases, the potential to extract valuable insights becomes greater. However, how datasets are structured, organized and accessible are critical considerations, before true value can be realized.
Few people are more enthusiastic about the power of network sciences to solve intractable data analysis problems than Dr. Ahmed Abdeen Hamed, Assistant Professor of Data Science and Artificial Intelligence at Norwich University, Northfield, Vermont, USA. We’re talking to Ahmed to get his thoughts on why this approach is proving so valuable in the medical sciences (and elsewhere), and to hear his opinions on the evolution of artificial intelligence (AI).
Can we start by defining what we mean by ‘network science’?
It’s essentially a way of analyzing data relating to complex systems, by thinking of them as a set of distinct players/elements (known as nodes) laid out as a map, with the connections between these elements being the links.
Doing this makes it possible to analyze such systems mathematically and graphically – which in turn enables us to spot patterns in the network and derive useful conclusions.
And that helps you to solve problems, right?
Absolutely, and not just any old problems! I’m particularly interested in intractable real-world problems – some are known to be Millennium Prize Problems where a prize of 1 million dollars is pledged for the correct solution of any of the problems; the sort that companies and governments spend millions of dollars trying to solve (e.g., drug discovery and cybersecurity).
These problems are often challenging because the answer is hidden amongst vast amounts of multimodal data – like records of social media interactions, libraries of medical records from drug trials, or databases of chemical properties, or combinations of these. It is not possible for any existing algorithm that runs on today’s computers to terminate if the problem is intractable. A big issue has been those conventional methods of analyzing such complex datasets (using the computers we possess today), because the number of potential interactions scales exponentially with the size of the network.
Where I think Turing reduction can change this is by simplifying the problem so that it becomes solvable by known and novel computational algorithms in non-exponential time. This is exciting because our world is essentially large systems of complex networks, so there are simply masses of applications out there ready for investigation by network analysis.
Like your work on drug repurposing, for example?
Exactly, and this is a great example of network analysis in action, because it shows how it can speed up the process of finding a drug to treat a given condition. Designing new drugs based on understanding the biology works well, but it’s a long-haul from discovery, through clinical trials, to market approval, whereas finding existing, FDA-approved drugs that can be repurposed for the disease in question has many advantages.
How do you go about creating a network based on drug data?
Well, I thought what if we could search through the literature for every known drug molecule, and rank them based on their potential to treat a particular disease or condition? So, I teamed up with colleagues from pharma and academia (the authors of the paper: TargetAnalytica: A Text Analytics Framework for Ranking Therapeutic Molecules in the Bibliome), and for over two years, we worked on refining a molecule-ranking algorithm. We realized that the more specific a molecule is for targeting a tissue of a certain organ, cell type, cell line or gene, the greater its potential as a possible treatment.
We hit upon drug specificity as being the best metric to use and applied it to a dataset comprising already published biomedical literature. The nodes in our network were the drug molecules – or more strictly ‘chemical entities’ as the main player of the network. Then we linked them back to other entities such as a gene, disease, protein, cell type, etc. The connection was essentially the distance of two players being mentioned in the abstract section of the same publication.
And more recently you’ve moved on to apply the principle to COVID-19, too?
That’s right – with the arrival of the pandemic, suddenly we had a need to find a treatment for this relatively unknown serious illness. I realized this was an ideal application for another network medicine tool, because of the sheer amount of information that has been and will continue to be generated on COVID-19.
Specifically, in the two years since the pandemic hit, there have been well over 150,000 publications related to COVID-19, and these publications contain everything you need to feed into a drug-repurposing algorithm. This biomedical literature is a goldmine for information! The more publications you analyze, the more elements are going to find themselves being connected, whether from the same publication or from another publication. The complexity of the network evolves and refines dramatically as you increase the size of the data input.
Can you tell us about your research into COVID-19 so far?
Well, we’ve developed a drug-repurposing algorithm called CovidX, which we introduced in August 2020. At the time, we used it to identify and rank 30 possible drug candidates for repurposing, and also validated the ranking outcomes against evidence from clinical trials. This work was published in the Journal of Medical Internet Research.
Post-CovidX, we’ve been working on even bigger datasets, and have recently had an article published with MDPI Pharmaceutics about the interplay of some drugs that may present themselves as a COVID-19 treatment. This study was in a much bigger scale (used a set of 114,000 publications) than the one of CovidX. It certainly provides a way to stitch together all the evidence that’s been accumulated about treating the virus over the last two and a half years! It is important that I also mention the role of clinical trials records mirroring the map from the biomedical literature and validating its findings.
This all sounds very interesting, but on a practical level, how do you extract useful insights from these massively complex networks?
That’s a great question, because it strikes at the heart of what makes network science so powerful. I’ll use a simple analogy.
Imagine you’re in a hospital, and you’re looking at the job roles of all the staff. Some members of staff interact with lots of people, like the receptionist or the ward manager, but others are more often involved as a close-knit group. For example, an anesthetist, surgeon and nurse would all have close involvement in performing in the operating theater, pretty much every time. We call this strongly connected community a ‘clique’, and it tells us that this combination of roles does something important. Now, if we were to look at thousands of hospitals, we might find similar ‘cliques’ in each one, and overlaying them all enables us to pull out those roles that rank most highly in their association with a successful outcome for the patient. It’s basically about removing all the noise.
If we go back to our network on drug literature, our first task is to find all these ‘cliques’ amongst our drugs. We can then work out where these ‘cliques’ overlap, and this enables us to identify the drugs that are most important for a successful outcome. And that’s where you focus your effort.
And you suggested earlier that network analysis has lots of other applications too?
Yes indeed! As long as you have a sufficiently large dataset and you’re able to identify the entities that you’re going to use to build your network map, then you can turn it to a myriad of problems. I love it, not just because the mathematics is beautiful, but because it allows us to reduce highly complex, intractable problems down to simpler problems that can be addressed without the exponential time difference.
Applications are really broad – from genetics to ecology, and from sociology to economics. But to come back to pharma, what makes it such a fertile area for exploiting the power of network science is that everything is documented – whether that’s in published reports, internal documents, or electronic lab notebooks, the data is all there. And not only that, but the data is high-quality, too. So, in a way, the quality of the data has inspired the quality of the science, and obviously in the medical arena, this can help to save lives.
Coming back to your passion for problem-solving, what’s your opinion about how well the AI part of computer science is doing?
Well, this might be a bit controversial, but to be honest I’m a bit disappointed with the progress that’s been made since the benchmark set by Deep Blue in 1997. If we could teach a computer to play chess 25 years ago, then why – with the vastly greater computing power we’ve now got at our disposal – can we not develop a supercomputer that could solve some genuine problems? If we created an AI that specialized in, let’s say, cancer or Alzheimer’s to the same level as Deep Blue specialized in chess, could we find an effective treatment? Why is that proving so hard?
But do you think there’s light at the end of this tunnel?
Yes, I do, and it’s all to do with the possibilities opening up in quantum computing. Now I’m not a specialist in this area, but it’s clear to me that because it works in a very different way, it should allow us to tackle previously intractable problems – what’s known as ‘quantum supremacy’.
And what’s exciting is that this computing power should be accessible to programmers everywhere, because it’s being interfaced with regular programming languages like Microsoft’s Q# language. I envision that within the next five to ten years, we’ll see this computing revolution become a reality. From my perspective as a network analyst, I’m really hoping it does, because it should allow us to remove the current limit on the number of elements we can have in a network. I would certainly like to get my hands on a quantum computer once the technology has matured!
Ahmed, it’s been a pleasure to talk to you, and thank you for sharing with us the power and potential of network analysis.
It’s clear that there are challenges for pharmaceutical researchers, whether they are using network analysis or machine learning approaches. To reap the benefits of either approach, the fundamentals of data storage, flexible access, and computational/analytical tools must be considered. At Paradigm4, we are tackling the data storage issues through our novel elastic cloud file system, flexFS, for more resource-efficient data storage and sustainable computing, to help accelerate drug discovery efforts. For more information on how our technology can help to transform your data analysis, contact email@example.com.
For more about the work of Dr. Hamed at Norwich University, read this article on the university’s website. Disclaimer: These views are Ahmed Abdeen Hamed’s own views and not those of Norwich University.
Dr. Ahmed Abdeen Hamed is Assistant Professor of Data Science and Artificial Intelligence at Norwich University, Northfield, VT, where since 2019 he has led the university’s data analytics courses and carried out research using algorithms and computational methods to address a wide range of global issues. Prior to this, he worked in the private sector, where he designed network-based drug-ranking algorithms, now patented under his name as the first inventor.