Distinguishing a mirror from a glass is a tough visual inference since both materials get their look from their environment, but humans seldom have trouble telling them apart. Little research was conducted to study how the visual system distinguishes between reflections and refractions, and no image-computable model that emulates human judgments has existed to date. For a study, researchers aimed to create a deep neural network that can mimic humans’ patterns of visual judgments. 

They trained hundreds of convolutional neural networks on more than 750,000 simulated mirror and glass items and compared their performance to human assessments and alternative classifiers based on “hand-engineered” picture attributes. All classifiers and humans performed with excellent accuracy on randomly picked photos and correlated strongly with one another. However, it was not enough to measure accuracy or correlation on random pictures to determine how similar models are to people. A decent model should also forecast common human mistakes. 

As a result, they carefully compiled a diagnostic image set in which humans made systematic mistakes, allowing them to pinpoint characteristics of the human-like performance. A large-scale, systematic search of feedforward neural architectures indicated that relatively shallow (three-layer) networks predicted human assessments better than any other model studied. The study was the first image-computable model that successfully distinguished mirrors from the glass while simulating human mistakes, indicating that mid-level visual processing may be especially crucial for the job.