When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition

Recently, most handwritten mathematical expression recognition (HMER) methodsadopt the encoder-decoder networks, which directly predict the markup sequencesfrom formula images with the attention mechanism. However, such methods mayfail to accurately read formulas with complicated structure or generate longmarkup sequences, as the attention results are often inaccurate due to thelarge variance of writing styles or spatial layouts. To alleviate this problem,we propose an unconventional network for HMER named Counting-Aware Network(CAN), which jointly optimizes two tasks: HMER and symbol counting.Specifically, we design a weakly-supervised counting module that can predictthe number of each symbol class without the symbol-level position annotations,and then plug it into a typical attention-based encoder-decoder model for HMER.Experiments on the benchmark datasets for HMER validate that both jointoptimization and counting results are beneficial for correcting the predictionerrors of encoder-decoder models, and CAN consistently outperforms thestate-of-the-art methods. In particular, compared with an encoder-decoder modelfor HMER, the extra time cost caused by the proposed counting module ismarginal. The source code is available at https://github.com/LBH1024/CAN.