Is Synthetic Dataset Reliable for Benchmarking Generalizable Person Re-Identification?

Recent studies show that models trained on synthetic datasets are able toachieve better generalizable person re-identification (GPReID) performance thanthat trained on public real-world datasets. On the other hand, due to thelimitations of real-world person ReID datasets, it would also be important andinteresting to use large-scale synthetic datasets as test sets to benchmarkperson ReID algorithms. Yet this raises a critical question: is syntheticdataset reliable for benchmarking generalizable person re-identification? Inthe literature there is no evidence showing this. To address this, we design amethod called Pairwise Ranking Analysis (PRA) to quantitatively measure theranking similarity and perform the statistical test of identical distributions.Specifically, we employ Kendall rank correlation coefficients to evaluatepairwise similarity values between algorithm rankings on different datasets.Then, a non-parametric two-sample Kolmogorov-Smirnov (KS) test is performed forthe judgement of whether algorithm ranking correlations between synthetic andreal-world datasets and those only between real-world datasets lie in identicaldistributions. We conduct comprehensive experiments, with ten representativealgorithms, three popular real-world person ReID datasets, and three recentlyreleased large-scale synthetic datasets. Through the designed pairwise rankinganalysis and comprehensive evaluations, we conclude that a recent large-scalesynthetic dataset ClonedPerson can be reliably used to benchmark GPReID,statistically the same as real-world datasets. Therefore, this study guaranteesthe usage of synthetic datasets for both source training set and target testingset, with completely no privacy concerns from real-world surveillance data.Besides, the study in this paper might also inspire future designs of syntheticdatasets.