The essential picture component for surface gloss perception is specular highlights. However, determining whether a bright area in an image is caused by specular reflection or another reason (e.g., texture marking) is difficult, and it was yet unknown how the visual system correctly distinguishes highlights. 

There was presently no image-computable model that mimics human highlight recognition. Thus they attempted to create a neural network replicating observers’ typical successes and failures. They created 179,085 photos of textured, glossy surfaces. After being fed such pictures, a feedforward convolutional neural network was trained to generate an image containing just the specular reflectance component. Participants examined such pictures and indicated whether or not certain pixels were highlights. The pixels that were questioned were carefully chosen to differentiate between ground truth and simple thresholding of picture intensity. 

When forecasting human reactions, the neural network beat the basic thresholding model and the ground truth. They then utilized a genetic algorithm to selectively eliminate connections inside the neural network to uncover network variations that matched human judgments even better. The best-performing network shared 68% of the variance with human judgments, which was higher than the unpruned network. They next utilized representational similarity analysis to compare the network’s inner representations to a wide range of hand-engineered picture attributes as a first step toward understanding it.