A researcher from the University has taken the leading light in the development of a software tool that can offer convincing answers to some of life's most intriguing questions.

The software tool contains supervised machine learning with digital signal processing (ML-DSP), could be the first time to make it possible to have a definite answer to questions such as how various species on Earth and in the oceans stay alive. How the extinct and recently-discovered species related to each other exist? What are the human mitochondrial DNA's bacteria origins? Is there a similar genomic signature between a parasite and its host?

Also, the software has the capability of positively impacting the personalized medicine industry by identifying the specific strain of a virus and therefore allowing for the development and prescription of precise drugs to treat it.

The ML-DSP is an alignment-free software tool that works by transforming a DNA sequence into a digital (numerical) signal and uses digital signal processing techniques to process and distinguish these signals from each other.

A professor in Waterloo's Faculty of Mathematics, Lila Kari, said that with this method, even if they have small fragments of DNA, they can still classify DNA sequences irrespective of their origin, or whether they are synthetic, natural, or computer-generated. The tool's other essential potential application is in the healthcare sector, as in this era of personalized medicine, it can be possible to classify viruses and customize the treatment of a specific patient with regards to the particular strain of the virus that disturbs them.

Lila Kari, along with Gurjit Randhawa, a Ph.D., candidate at Western University, and an Associate Professor in the Department of Biology authored the paper which detailed the new software tool, with the title: "ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels."

Researchers carried out a quantitative assessment in the study with other state-of-the-art classification software tools on two small benchmark datasets and one large 4,322 vertebrate mitochondrial genome dataset.

Kari added that their results reveal that ML-DSP overwhelmingly outperforms alignment-based software in terms of processing time while having classification accuracies that are comparable in the case of small datasets and superior in the case of large datasets. In comparison with other alignment-free software, ML-DSP has significantly better classification accuracy and is overall faster.

Also in the conduct of the authors were the preliminary experiments indicating the potential of ML-DSP to be used for other datasets, by classifying 4,271 complete dengue virus genomes into subtypes with 100 percent accuracy, and 4,710 bacteria genomes into divisions with 95.5 percent accuracy.