Hello! We’ve generally been having good results with Gumloop for extracting data from PDFs. However, we recently noticed some data wasn’t being found no matter how directly we made the prompt. It turns out at the OCR stage, some text wasn’t being captured. This text is right aligned but still plainly visible and local tests with both pdftotext and tesseract show the text being found. We’re not sure why the Gumloop OCR component isn’t capturing it, but it’s a necessity for us moving forward.
In particular, we are expected “Date Prepared: 03/13/2025” and “Effective Date: 01/01/2025” to be in the OCR output as they appear in the original PDF, but it does not seem to be present.
We’d also be happy to share the original PDF that as used if you don’t have access to it on your backend, but we’d prefer to share that privately as it is a business document.
Any help you can provide with helping us solve this OCR issue would be appreciated!.
Hey @DamianR! If you’re reporting an issue with a flow or an error in a run, please include the run link and make sure it’s shareable so we can take a look.
Find your run link on the history page. Format: https://www.gumloop.com/pipeline?run_id={{your_run_id}}&workbook_id={{workbook_id}}
Make it shareable by clicking “Share” → ‘Anyone with the link can view’ in the top-left corner of the flow screen.
Provide details about the issue—more context helps us troubleshoot faster.