BN-DRISHTI: Bangla Document Recognition through Instance-level Segmentation of Handwritten Text Images

Handwriting recognition remains challenging for some of the most spokenlanguages, like Bangla, due to the complexity of line and word segmentationbrought by the curvilinear nature of writing and lack of quality datasets. Thispaper solves the segmentation problem by introducing a state-of-the-art method(BN-DRISHTI) that combines a deep learning-based object detection framework(YOLO) with Hough and Affine transformation for skew correction. However,training deep learning models requires a massive amount of data. Thus, we alsopresent an extended version of the BN-HTRd dataset comprising 786 full-pagehandwritten Bangla document images, line and word-level annotation forsegmentation, and corresponding ground truths for word recognition. Evaluationon the test portion of our dataset resulted in an F-score of 99.97% for lineand 98% for word segmentation. For comparative analysis, we used three externalBangla handwritten datasets, namely BanglaWriting, WBSUBNdb_text, and ICDAR2013, where our system outperformed by a significant margin, further justifyingthe performance of our approach on completely unseen samples.