AI researchers from Google's DeepMind published research this week in which they tried to train two neural networks to solve basic math problems. It focused on arithmetic, algebra, and calculus, all of which are taught in high school.

Unfortunately, the neural networks didn't do well. In addition to guessing six as the answer to the question "What is the sum of 1+1+1+1+1+1+1?", which is obviously incorrect, the networks just got 14 out of 40 questions correctly on the standard test.

This result is equivalent to an E for a sixteen-year-old student in the British school system. Basically, AI is having an issue in learning any basic math, the researchers note.

The paper, "Analysing Mathematical Reasoning Abilities of Neural Models," was made as a benchmark test that was set so others develop neural networks for math problems. This is similar to how ImageNet was made as an image recognition benchmark test.

The paper, authored by David Saxton, Edward Grefenstette, Felix Hill and Pushmeet Kohli of DeepMind, is posted on the arXiv preprint server. 

Citing noted neural net critic Gary Marcus of NYU, the authors refer to the famous "brittleness" of neural networks, and argue for investigation into why humans are better able to perform "discrete compositional reasoning about objects and entities, that 'algebraically generalize'." 

They propose different sets of math problems should push neural networks into having such reasoning, which includes things like "Planning (for example, identifying the functions in the correct order to compose)" when a math problem has parts that may or may not be associative or distributive or commutative. 

"It should be harder for a model to do well across a range of problem types (including generalization, which we detail below)," they write, "without possessing at least some part of these abilities that allow for algebraic generalization." Hence, the data set.

"We are interested here in evaluating general purpose models, rather than ones with their mathematics knowledge already inbuilt," they write. 

"What makes such models (which are invariably neural architectures) so ubiquitous from translation to parsing via image captioning is the lack of bias these function approximators present due to having relatively little (or no) domain-specific knowledge encoded in their design."

After the experiment that had gone wrong, it has been concluded that on the high school curriculum which has a string of real-world problems, the neural net's E grade is disappointing and moderate.

The researchers are not coming up with a new set of data, and hopefully, the AI will make it past six.