Contour Integration: Big Data Enhances Model's Visual Recognition Capabilities
Despite the enormous success of deep learning in computer vision, models still fall short of human performance when generalizing to new input distributions. Current benchmarks do not thoroughly investigate specific failure points by analyzing performance under various controlled conditions. Our study addresses this gap by designing an experiment to systematically examine why and where models struggle with contour integration—a hallmark of human vision—by testing their object recognition capabilities across different levels of object fragmentation. In the experiment, we found that even when objects had minimal contours, humans (n=50) maintained high accuracy in recognizing them. Conversely, the models we tested—over 1,000 in total—showed significantly lower sensitivity to increasing object contours, performing only slightly better than random guessing. Notably, as the training dataset size grew very large (around 5 billion), model performance began to approach human levels. Humans exhibited a distinct bias towards recognizing objects composed of directional fragments, rather than non-directional ones. We discovered that models with similar characteristics performed better on our tasks, and this bias became stronger with larger training datasets. Furthermore, training models to perform contour integration resulted in a higher shape bias. Overall, our findings underscore that contour integration is a crucial feature of object vision, underpinning the robust performance in object recognition. This mechanism likely emerges from learning on large-scale data, highlighting the importance of extensive and diverse training sets in developing more human-like computer vision systems.
