Test-time Adaptation vs. Training-time Generalization: A Case Study in Human Instance Segmentation using Keypoints Estimation

We consider the problem of improving the human instance segmentation maskquality for a given test image using keypoints estimation. We compare twoalternative approaches. The first approach is a test-time adaptation (TTA)method, where we allow test-time modification of the segmentation network'sweights using a single unlabeled test image. In this approach, we do not assumetest-time access to the labeled source dataset. More specifically, our TTAmethod consists of using the keypoints estimates as pseudo labels andbackpropagating them to adjust the backbone weights. The second approach is atraining-time generalization (TTG) method, where we permit offline access tothe labeled source dataset but not the test-time modification of weights.Furthermore, we do not assume the availability of any images from or knowledgeabout the target domain. Our TTG method consists of augmenting the backbonefeatures with those generated by the keypoints head and feeding the aggregatevector to the mask head. Through a comprehensive set of ablations, we evaluateboth approaches and identify several factors limiting the TTA gains. Inparticular, we show that in the absence of a significant domain shift, TTA mayhurt and TTG show only a small gain in performance, whereas for a large domainshift, TTA gains are smaller and dependent on the heuristics used, while TTGgains are larger and robust to architectural choices.