BYTECOVER2: TOWARDS DIMENSIONALITY REDUCTION OF LATENT EMBEDDING FOR EFFICIENT COVER SONG IDENTIFICATION
Convolutional neural network (CNN)-based methods havedominated the recent research of cover song identification(CSI). A typical example is the ByteCover system we proposed, which has achieved state-of-the-art results on all themainstream datasets of CSI. In this paper, we propose an upgraded version of ByteCover, termed ByteCover2, which further improves ByteCover in both identification performanceand efficiency. Compared with ByteCover, ByteCover2 isdesigned with an additional PCA-FC module, which integrates the capability of principal component analysis (PCA)and fully-connected (FC) neural network for dimensionality reduction of the audio embedding, allowing ByteCover2to perform CSI in a more precise and efficient way. Weevaluated ByteCover2 on multiple datasets in different dimension sizes and training settings, where ByteCover2 beatall the compared methods including ByteCover, even with adimension size of 128, which is 15 times smaller than that ofByteCover.