Skoltech Leads Open-Source Project Using Neural Networks to Generate Organic Molecule Nomenclature

A new open-source project employs a neural network to create names for organic compounds compliant with the IUPAC nomenclature systems - showing the potential of this technology to efficiently handle exact algorithmic problems.

Organic compounds, those that generally have carbon-hydrogen bonds, are named following a standardized set of rules implemented by the International Union of Pure and Applied Chemistry (IUPAC), commonly referred to as the IUPAC nomenclature. However, in following this standard, molecules can sometimes end up having long and tedious names corresponding to their structure. For example, table sugar is scientifically known as sucrose, and has the preferred IUPAC name (2R,3R,4S,5S,6R) -2-{[(2S,3S,4S,5R)-3, 4- Dihydroxy- 2, 5- bis (hydroxymethyl) oxolan-2-yl]oxy}- 6- (hydroxymethyl) oxane- 3,4,5-triol.

More importantly, these names do not allow the omission of even a single digit or character, requiring chemists to pay close attention to reports and notes in addition to having a clear understanding of the nomenclature rules. While there are off-the-shelf software tools that assist in IUPAC nomenclature is available,

The new neural network to help name organic compounds is detailed in the article "Transformer -based artificial neural networks for the conversion between chemical notations," appearing in the latest Scientific Reports journal.

Training a Neural Network for Exact Algorithmic Problems

Researchers from the Skolkovo Institute of Science and Technology (Skoltech) in Moscow, Russia, and colleagues from Lomonosov Moscow State University, an AI tech startup Syntelly, developed and trained the new neural network to generate names for organic compounds in compliance with IUPAC nomenclature.

"Initially, we wanted to create an IUPAC name generator for Syntelly, our AI chemistry platform. Soon we realized that it would take us more than a year to create an algorithm by digitizing the IUPAC rules, so we decided instead to leverage our experience in neural network solutions," explains Sergey Sosnin, lead author of the study, a Skoltech researcher and co-founder of Syntelly, in a press release from the institute.

To realize this solution, researchers used the Transformer architecture, a machine learning model, and one of today's most powerful machine translation neural networks developed by Google. It served as the basis for the new neural network, using the model to train their own network in converting the structural representation of a molecule to an IUPAC name and vice versa.

Testing and Demonstrating The Network's Capacity

To test the capability of the new neural network to generate names consistent with IUPAC nomenclature, the Skoltech team used PubChem, the world's largest collection of free and accessible chemical data. It also holds the largest chemical database of more than 100 million compounds. After being designed for six weeks, the neural network-based from the Transformer architecture managed to perform the required conversion with almost the same accuracy as other rule-based, algorithmic solutions.

More importantly, the study demonstrates how neural networks can potentially address exact algorithmic problems. Sosnin also explains that distinguishing two images, such as a cat and a dog, is an "equally easy task" for humans and neural networks. However, there is no existing method to generate a "purely algorithmic solution."

Skoltech researchers have already implemented the new neural network on the Syntelly platform and are publicly available online. Researchers hope that the new method could be used for converting between chemical notations as well as other technical and related tasks.

Check out more news and information on Organic Molecules in Science Times.

Skoltech Leads Open-Source Project Using Neural Networks to Generate Organic Molecule Nomenclature

Training a Neural Network for Exact Algorithmic Problems

Testing and Demonstrating The Network's Capacity

Most Popular

Woman Hospitalised After Popping a Pimple Most People Wouldn't Think Twice About — What Is the 'Triangle of Death'?

New Study Links COVID-19 to Accelerated Blood Vessel Aging, Particularly in Women

Tesla Cybertruck Crashes Anti-ICE Protests in LA, Becomes Unlikely Symbol of Trump Controversy

Elon Musk Claims Tesla Robotaxi Will Hit Streets This Month: 'Most Important Product' Yet

Texas Official Shot Down Siren Flood Alert, Complaining That It Might Go Off 'In the Middle of the Night': Report

Latest Stories

Google Earthquake Detection Comes to Wear OS Watches; Life-Saving Alerts Now on Your Wrist

Elon Musk Claims Tesla Robotaxi Will Hit Streets This Month: 'Most Important Product' Yet

Tesla Cybertruck Crashes Anti-ICE Protests in LA, Becomes Unlikely Symbol of Trump Controversy

How Much Water and Energy Does ChatGPT Use? Sam Altman Breaks Down the Numbers

Recommended Stories

Voyager 2’s Historic Uranus Flyby May Have Captured Rare Event, Changing Scientists’ View of the Planet

Is the Ozone Layer Repairing Itself? Scientists Think So

SpaceX Dragon Successfully Docks With ISS, Delivering 6,000 Pounds of Supplies

Colorectal Cancer Deaths Increasing Among Millennials and Gen X: Learn the Warning Signs

Skoltech Leads Open-Source Project Using Neural Networks to Generate Organic Molecule Nomenclature

Training a Neural Network for Exact Algorithmic Problems

Testing and Demonstrating The Network's Capacity

Most Popular

Latest Stories

Subscribe to The Science Times!

Recommended Stories