Date

2 years ago

Size

9.83 GB

Publish URL

github.com

Paper URL

arxiv.org

License

CC BY-NC-SA 3.0

Tags

Machine Vision

The Muharaf dataset is a machine learning dataset focusing on handwritten Arabic recognition, created by Mehreen Saeed et al. in 2024. The related paper results are "Muharaf: Manuscripts of Handwritten Arabic Dataset for Cursive Text Recognition", accepted by NeurIPS 24. This dataset contains more than 1.6k images of historical handwritten pages transcribed by archival Arabic experts. Each document image is accompanied by the spatial polygon coordinates of its text lines as well as information on basic page elements. The Muharaf dataset was built to advance the state of the art in the field of handwritten text recognition (HTR), not only for Arabic manuscripts but also for connected texts. The dataset contains a variety of writing styles and a wide range of document types, including personal letters, diaries, notes, poems, church records, and legal correspondence. In the research paper, the authors describe the data acquisition process, the salient features and statistics of the dataset, and provide preliminary baseline results obtained by training convolutional neural networks using this data. The Muharaf dataset is divided into two parts: the public part contains 1,216 images and is distributed under the CC BY-NC-SA 4.0 license; the restricted part contains 428 images, distributed under a proprietary license, and can only be downloaded by contacting Carlos Younes at the Phoenix Center for Lebanese Studies. This part of the data is only for research purposes and redistribution is not allowed. In addition, the Muharaf dataset was created using the ScribeArabic annotation software, and the manual of the software can help users understand how it works. The image files in the dataset and the corresponding annotations, transcriptions, and tags can be viewed using the PAGE-XML viewer.

Muharaf.torrent

Seeding 1Downloading 0Completed 168Total Downloads 312

Muharaf/
- README.md
  2.27 KB
- README.txt
  4.54 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at [email protected] for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset

Discuss on Discord

Date

2 years ago

Size

9.83 GB

Publish URL

github.com

Paper URL

arxiv.org

License

CC BY-NC-SA 3.0

Related Datasets

Groundsource Global Flood Events Dataset

3 months ago

THINGS-EEG EEG Dataset

5 months ago

RubricHub_v1 Multi-Domain Generative Task Dataset

5 months ago

X-ray Contraband Detection Dataset

6 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Muharaf Handwritten Arabic Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

Muharaf Handwritten Arabic Dataset

Related Datasets

Groundsource Global Flood Events Dataset

THINGS-EEG EEG Dataset

RubricHub_v1 Multi-Domain Generative Task Dataset

X-ray Contraband Detection Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

Muharaf Handwritten Arabic Dataset

Related Datasets

Groundsource Global Flood Events Dataset

THINGS-EEG EEG Dataset

RubricHub_v1 Multi-Domain Generative Task Dataset

X-ray Contraband Detection Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

Groundsource Global Flood Events Dataset

THINGS-EEG EEG Dataset

RubricHub_v1 Multi-Domain Generative Task Dataset

X-ray Contraband Detection Dataset

Related Datasets

Groundsource Global Flood Events Dataset

THINGS-EEG EEG Dataset

RubricHub_v1 Multi-Domain Generative Task Dataset

X-ray Contraband Detection Dataset