Can Optical Character Recognition (OCR) Tools be Used to Obtain High-Quality Text Extraction from Handwritten Documentations?

What to know

  • Presentation Day/Time: Wednesday, April 23, 2:55–4:00 pm
  • Presenter: Sifang Kathy Zhao, PhD, MPH, EIS officer assigned to the National Center for Chronic Disease Prevention and Health Promotion, Division of Reproductive Health
Sifang Kathy Zhao, PhD, MPH

What did we do?

  • To reduce maternal mortality, many low-resource countries use a Maternal Death Surveillance and Response (MDSR) system for monitoring deaths, reviewing causes, and implementing interventions. MDSR reviews are often unanalyzed due to large amounts of handwritten documentations (more than 30 pages per death review) and limited resources. We compared the quality of existing Optical Character Recognition (OCR) tools for text extraction from handwritten MDSR review documentations.

What did we find?

  • Using handwritten documentations from MDSR reviews, preliminary results show that the quality of OCR for text extraction is relatively low across all OCR tools.
  • The highest quality text extraction was from Paddle OCR with 30% nonsense overall, followed by Tesseract OCR with 36% nonsense overall, and Easy OCR with 42% nonsense overall.

Why does it matter?

  • Currently, publicly available OCR tools evaluated are inadequate for obtaining high-quality OCR for handwritten documentations. Additional tools need to be developed before handwritten documentations can be accurately extracted and reliably analyzed.
  • Alternative methods such as audio transcription may be worth exploring to strengthen MDSR reviews and ultimately reduce maternal mortality in low-resource settings.