In the past decade, American and Chinese government have launched the Materials Genome Initiative (MGI) and National Materials Genome Project. One of the main objectives of these missions is to facilitate the identification of material data to speed material discovery and development. The researchers of the study have published their findings in SCIENCE CHINA Chemistry, and the paper is available online.

The current techniques are promising candidates to identify structures effectively but have limited ability to deal with all structures accurately and automatically in the substantial materials database, because different material resources and various measurement errors lead to a variation of bond length and bond angle.

According to the proposition of Feng Pan and his colleagues from Peking University Shenzhen Graduate School, a new paradigm based on graph theory (GT scheme) to improve the efficiency and accuracy of material identification, which focuses on processing the "topological relationship" rather than the value of bond length and bond angle among the different structures.

Initially, the team simplifies crystal structures into a graph in GT scheme, which only consists of vertices and edges, in which atoms are simplified as vertices and adjacent atoms with the actual chemical bonds are "connected" with edges. If the topological connections in the simplified graphs between two structures are isomorphic, the GT scheme will consider them as one structure. By using this technique, the team achieve the automatic deduplication for big materials database for the first time, which identifies 626,772 unique structures from 865,458 original structures.

Furthermore, they have modified the GT scheme to solve some advanced problems such as identifying highly distorted structures, distinguishing structures with strong similarity and classifying complex crystal structures in materials big data. In comparison with the traditional structure chemistry methods, the GT scheme can address these issues much more quickly, which enhances the efficiency and reliability of material identification.

The team tends to achieve high-throughput calculation by using this artificial intelligent technique, preparation, and detection for the materials database. The GT scheme subverts the traditional material research methods and accelerates the development in the material research field.

The authors of this study owe high gratitude to Dr. Lin-Wang from Lawrence Berkeley National Laboratory and Dr. Wenfei Fan from the University of Edinburgh for their helpful discussions. Supporters of this work include the National Key R&D Program of China, the National Natural Science Foundation of China, Soft Science Research Project of Guangdong Province, and New Energy Materials Genome Preparation & Test Key Laboratory Project of Shenzhen.