A fundamental ability essential to abstract combinatorial cognition is same-different visual reasoning. Since deep convolutional neural networks (DCNNs) were tested for same-different classification as a result of the discovery, there was debate over whether the models were capable of performing the task. However, since most studies of same-different classification use photos from the same pixel-level distribution as the training images, the findings were generally equivocal. 

Researchers put relational same-different thinking in DCNNs to the test in our work. They demonstrated through a series of simulations that models built on the ResNet architecture were capable of visual same-different classification, but only in the presence of test images that were pixel-for-pixel comparable to the training images. The performance of DCNNs, on the other hand, significantly declined when the testing distribution shifts but did not alter the relationship between the items in the picture. 

The conclusion held true even when the DCNNs’ training regime was broadened to incorporate images from a variety of other pixel-level distributions or when the model was trained on the testing distribution but on a different task in a multitask learning environment. They also demonstrated that the relation network, a deep learning architecture created especially to address visual relational reasoning issues, had the same kinds of drawbacks. 

The study’s overall findings implied that learning same-different connections were outside the capabilities of DCNNs as they exist now.

Reference: jov.arvojournals.org/article.aspx?articleid=2783637