Atick et al. proposed three decades ago that human frequency sensitivity may derive from the augmentation necessary for more effective retinal picture interpretation. For a study, researchers reconsidered the role of low-level vision tasks in explaining contrast sensitivity functions (CSFs) in light of the current trend of employing artificial neural networks for vision research and the current understanding of retinal image representations.
As a first contribution, they demonstrated that autoencoders, a popular type of convolutional neural network (CNN), can develop human-like CSFs in the spatiotemporal and chromatic dimensions when trained to perform some basic low-level vision tasks (such as retinal noise and optical blur removal), but not others (such as chromatic adaptation or pure reconstruction after simple bottlenecks).
As an illustration, the best CNN (in the collection of basic architectures examined for retinal signal augmentation) reproduced the CSFs with a root mean square error of 11% of the maximal sensitivity. As a second contribution, they presented experimental evidence that deeper CNN’s better at quantitative goals for specific functional objectives (at low abstraction levels) and are poorer at mimicking human-like occurrences (such as the CSFs). However, the low-level conclusion (for the investigated networks) did not necessarily contradict earlier publications indicating deeper nets’ benefits in simulating higher-level vision goals. However, the findings, consistent with a growing body of research, raised another caution concerning CNNs in vision science since the use of simplified units or unrealistic designs in goal optimization might be a constraint for modeling and understanding human vision.