Algorithm Can Predict Behavior of Unknown Molecules to Speed Up Discovery of New Medicine

A new algorithm could take mass spectrometry data from molecules and help predict the identity of unknown molecules and substances that arise from them

The new process was designed by researchers from the Computational Biology Department at Carnegie Mellon University in the US and the St. Petersburg University in Russia. Called "MolDiscovery," it could peg scientists early in their studies whether they have found something previously undiscovered or simply come across something already identified before.

Details of the new algorithm appear in the report "MolDiscovery: learning mass spectrometry fragmentation of small molecules," published in the latest Nature Communications.

ALSO READ : Acidity of Molecules Measured Using New Technique; Will Help Determine Substance's Chemical Behavior

Removing Unnecessary Downtime from Random Substances

In a news release from Carnegie Mellon, assistant professor Hosein Mohimani explains that scientists waste time from isolating molecules that have been previously discovered in a case he calls "essentially rediscovering penicillin." He adds that the capabilities of MolDiscovery to identify whether unknown molecules are actually something new could save researchers time, which equates to millions of dollars in an industrial setup. Mohimani also expressed hopes that the new algorithm could empower pharmaceuticals to better search for new natural products that could be used to create new medicine.

Mohimani's works in the CMU's Metabolomics and Metagenomics Lab focused on searching for new and naturally occurring drug substances. He adds that once a scientist dings a molecule that could potentially be a new drug, often mixed in water or soil-based samples, it still takes a year or more to identify the isolated molecule - without even the assurance that it is actually a new substance.

In the fields of medicine, physicians devote equally laborious periods of time to finding unknown molecules that might be new biomarkers in human samples - plasma, blood, urine, fecal samples - which could uniquely signal the presence of disease. WIth MolDiscovery speeding up the process, medical researchers can now find new biomarkers without having to waste time cross-referencing them with existing data collections.

How MolDiscovery Uses Mass Spectrometry Data

Mass spectrometry data refers to the measurement data taken from the mass-to-charge ratio of the molecules that are present in any given sample. This is often used to identify the molecular weight of the substances present in the substance. The mass spectrometry data for a specific substance serves as its "fingerprints," unique to itself. However, there remains no singular database to match each and every isolated sample against. MolDiscovery works around this limitation by predicting the identity of a molecule from the substance's mass spectrometry data, even in the absence of a mass spectra database.

With this, researchers behind the development of MolDiscovery hope that their work could be useful in laboratories in the world. They also suggest the use of the new algorithm with the NRPminer, a machine learning platform also developed by Mohimani's lab with the intent of helping scientists isolate natural products. They've previously presented the capabilities and limitations of the NRPminer in the report "Integrating genomics and metabolomics for scalable non-ribosomal peptide discovery" in the May 2021 issue of Nature Communications. Like MolDdiscovery, it also uses mass spectrometry data but is currently geared toward the identification of Non-Ribosomal Peptides (NPR).