Few-shot pixel-precise document layout segmentation via dynamic instance generation and local thresholding
Over the years, the humanities community has increasingly requested the creation of artificial intelligenceframeworks to help the study of cultural heritage. Document Layout segmentation, which aims atidentifying the different structural components of a document page, is a particularly interesting taskconnected to this trend, specifically when it comes to handwritten texts. While there are many effectiveapproaches to this problem, they all rely on large amounts of data for the training of the underlyingmodels, which is rarely possible in a real-world scenario, as the process of producing the ground truthsegmentation task with the required precision to the pixel level is a very time-consuming task and oftenrequires a certain degree of domain knowledge regarding the documents at hand. For this reason, in thepresent paper, we propose an effective few-shot learning framework for document layout segmentationrelying on two novel components, namely a dynamic instance generation and a segmentation refinementmodule. This approach is able of achieving performances comparable to the current state of the art onthe popular Diva-HisDB dataset, while relying on just a fraction of the available data.