A radiology chain processing high volumes of procedures daily faced a critical bottleneck: manually comparing scanned Hebrew medical referrals against insurer obligation forms. Mismatches in procedure codes caused significant financial leakage. We built a multi-engine OCR and verification system — deployed on-premise for data privacy.
The radiology chain processes high volumes of medical procedures daily. A critical administrative bottleneck: staff manually compare scanned Hebrew medical referral forms against insurer obligation forms, checking that procedure codes match. Discrepancies lead to reimbursement challenges and financial leakage. 90% of forms arrive as physical paper. The comparison is tedious, error-prone, and consumes significant staff time.
A multi-engine OCR and document verification system:
1. Technology benchmarking — evaluated and compared multiple OCR engines for Hebrew medical forms: Tesseract, Azure AI Vision, ABBYY FineReader, DeepSeek VLM, and Google Gemini.
2. Hebrew OCR extraction — optimized for the specific challenges of medical forms: diverse fonts, stamps, handwriting, and varying scan quality.
3. Automated preprocessing — image denoising and deskewing to improve OCR accuracy on low-quality scans.
4. Procedure code matching — automated comparison logic between referral and obligation forms, identifying matches and mismatches.
5. Per-case reporting — match/mismatch results with confidence indicators and processing time per document pair.
Privacy-first: the entire system is deployed on-premise. Sensitive medical data never leaves the premises.
- Multi-engine OCR benchmarking (Tesseract, Azure AI Vision, ABBYY, Gemini)
- Hebrew language optimization for diverse fonts, stamps, and scan quality
- Automatic image preprocessing (denoising, deskewing)
- Automated procedure code matching between form pairs
- Confidence scoring and per-case reporting
- Fully on-premise deployment for medical data privacy
The system achieves 80% accuracy on procedure code matching across 1,000 test document pairs, with end-to-end processing under 13 seconds per pair. On-premise deployment ensures compliance with medical data privacy requirements.