OCR
Andrew Ng's newly released document extraction service shows significant limitations when processing complex financial statements, with high error rates and slow processing times. Tests revealed over 50% hallucinated values and frequent missing data in financial tables, highlighting the challenges of using LLMs for document extraction.
Kreuzberg is a Python library offering asynchronous text extraction capabilities from various document formats, including PDFs, images, and office files, with local processing and minimal dependencies. The library provides both single-item and batch processing options, integrating tools like Tesseract OCR and Pandoc for comprehensive format support.
A new benchmark evaluates Vision-Language Models against traditional OCR systems for text recognition in video environments, using a dataset of 1,477 annotated frames from diverse sources. Advanced models like Claude-3, Gemini-1.5, and GPT-4o demonstrate superior performance in many scenarios, though challenges with hallucinations and occluded text persist.
OCR4all provides a completely free, open-source optical character recognition solution without any paywalled features or private code restrictions.