Extract content from PDF files having the same name but located in different mail inbox

sengsan · February 2, 2025, 1:05pm

I tried to create an agent which help extracted data from my payslip mail. Every week, I would receive an email containing PDF file, named PaySlip. The flow runs successfully but all rows contain data from the 1st PDF file only. I tried another approach by saving those PDF files into google drive and ensure those files have different names before running data extraction. The PDF files are named differently in the google drive folder, however, the content is still the same, the content of all 20 PDF files are coming from the 1st PDF file. Hope to hear from everyone how to solve this issue. Thanks

Wasay-Gumloop · February 3, 2025, 3:05am

Hey @sengsan - Could you please share the run link from the history page with share access enabled so I can take a look at the flow and identify the issue?

Wasay-Gumloop · February 4, 2025, 12:44am

Hey @sengsan - This is the workbook link, could you please share the run link instead from the history page so I can take a look at this error:

The flow runs successfully but all rows contain data from the 1st PDF file only

The run link would have the run_id in the URL.

sengsan · February 4, 2025, 11:57am

https://www.gumloop.com/pipeline?workbook_id=33MNEBivyuRLsnvT8peQxE&run_id=jVYq7mtRvz6ctsHH3HKBYq

Hope this is the correct one.

Wasay-Gumloop · February 5, 2025, 3:08am

Thank you for sharing! The issue here is that the PDF Reader node does not natively work with Drive File URLs. In this case you can connect the Attachment output from the Gmail Reader node with the File input of the PDF Reader node.

Here’s an example: https://www.gumloop.com/pipeline?workbook_id=mPGufFkyybBkbawdXiSp3B

Let me know if this works for you.

sengsan · February 6, 2025, 11:27am

Thanks Wasay for supporting. I tried the suggested method. It did run successfully, but the content is not accurate. It is still the same problem, all rows or all PDFs files were exactly the same, no new data / files.

Wasay-Gumloop · February 7, 2025, 12:50am

Could you share the run link for this please? Access to the sheet you’re writing would be helpful too: wasay@gumloop.com

sengsan · February 11, 2025, 2:02pm

Hi Wasay. Please kindly check this.
https://www.gumloop.com/pipeline?run_id=JncEEAyVX25FdgNM2njfwH&workbook_id=33MNEBivyuRLsnvT8peQxE

And the google sheet is already shared to your email. I tried this flow with attached PDF files having different name, it works perfectly. But when the attached PDF files have the same name, it does not work.
Thank you

Wasay-Gumloop · February 12, 2025, 7:54pm

Thank you for sharing! The file name should be unique when processing in loop mode. Try using a subflow here, example setup: https://www.gumloop.com/pipeline?workbook_id=m3tbg5fjDQ6KoyAm5j5oMw

Let me know if that works for you.

JS14 · February 12, 2025, 9:57pm

Hey @Wasay-Gumloop I’m struggling with this issue where the PDF reader can’t access PDF’s stored in Gdrive via URL. Gummie gave me a few workarounds:

use find and replace to edit the URL so that the file is downloadable (didn’t work after various iterations)
Use Google Drive File Reader, and output the text string straight into the PDF reader node. (also not working, getting a blank output)

I really need to use the Gdrive URL because it’s in a Gsheet that I’m looping through in the larger flow. Is there a known workaround for this issue that you could point me to?

Wasay-Gumloop · February 12, 2025, 10:23pm

Hey! The PDF Reader node does not work directly with Drive URLs however you can use the Drive File Reader node to read the file. Here’s an example: https://www.gumloop.com/pipeline?workbook_id=7A4g4VNto526JcJwguqRAA&run_id=C6khiQAeEgELdgtq9rirfN

Let me know if this works for you.

JS14 · February 13, 2025, 12:39am

Got it, so skip the PDF reader altogether. Is there any downside to doing it that way? My understanding was that the PDF reader had this advanced reader mode that made the PDF more readable to the LLM’s before the extraction step. I was hoping that would make for a higher quality output. Could you speak to that?

Wasay-Gumloop · February 13, 2025, 12:47am

You’re right about the advanced reader mode but since in your case we’re working with Drive URLs the only option is to use the Drive File Reader node. That said, I’ve made a note for drive url capability in the PDF reader node in our backlog.

JS14 · February 13, 2025, 12:56am

Thanks! Is there a way I can follow the ticket/feature request so i get notified if it one day gets built?

Wasay-Gumloop · February 13, 2025, 1:05am

We’re not publicly sharing the timeline or backlog as of now but you can keep an eye on https://www.gumloop.com/changelog

Topic		Replies	Views
PDF reader keeps failing with Gmail Get Help Extract-Data , Gmail-Reader	4	22	April 17, 2025
PDF Reader Step is Failing Get Help Gmail-Reader	9	26	April 7, 2025
Only 1 attachment from Gmail Reader for mail with multiple attachments Get Help Gmail-Reader	4	28	February 7, 2025
Multiple IF-ELSE Nodes Get Help Extract-Data , General	7	27	April 8, 2025
PDF reader failing reading PDF in Google Drive Get Help Drive-File-Reader	6	11	April 21, 2025

Extract content from PDF files having the same name but located in different mail inbox

Related topics