Website Scraper Failures when Looping, but perfect when individual flows are run

BigGummiePlans · May 14, 2025, 3:59am

I have a flow to take a URL, scrape data, AI analyze, and output back into sheets.
When the URL are input one-at-a-time, manually into the flow, everything works perfectly.
When the URL’s are read from google sheet, some scrapes fail. Repeated runs give different URL failures, so it feels random.
I increased timeout on website scraper and wouldn’t say that changed the result all that much.
I reverted back to individual input, and again, it works fine.

Pipeline of Individual Flow - Perfect Run.
https://www.gumloop.com/pipeline?run_id=gNrnJTKez78s9GzHg9aJ6U&workbook_id=dSV9q7fDmaz4jZTED6WnJj&tab=1

Pipeline of Looped Flow - some URL scrapes fail.
https://www.gumloop.com/pipeline?run_id=RonNduSJMe2A5pBPapHn3a&workbook_id=dSV9q7fDmaz4jZTED6WnJj

Here is sheets with input URLS and expected results (shared permissions lifted)

Gumloop_Bot · May 14, 2025, 3:59am

Hey @BigGummiePlans! If you’re reporting an issue with a flow or an error in a run, please include the run link and make sure it’s shareable so we can take a look.

Find your run link on the history page. Format: https://www.gumloop.com/pipeline?run_id={{your_run_id}}&workbook_id={{workbook_id}}
Make it shareable by clicking “Share” → ‘Anyone with the link can view’ in the top-left corner of the flow screen.
Provide details about the issue—more context helps us troubleshoot faster.

You can find your run history here: https://www.gumloop.com/history

Wasay-Gumloop · May 15, 2025, 1:37am

Hey @BigGummiePlans –

If you click on the subflow runs that failed on the main flow where URLs are being pulled from the Sheet Reader node you’ll notice that on each failed run the Duplicate node failed, this is because the Extract Data node could not extract relevant data for a specific field and output a blank string, in which case the Duplicate node did not have any input to duplicate.

View failed subflow run:

Subflow error:

I’m not exactly sure why a Duplicate node is needed here, I’m assuming it was added to meet the List input when Writer Mode on the sheet writer node is set to Write to Column. A quick and easy solution would be to simply remove the duplicate nodes and change the writer mode to Add a Single New Row – this will allow you to write empty/blank strings:

A more robust solution here though would involve two edits:

In your subflow you should use a Google Sheet Updater node to find and update the existing row since you’re using a single spreadsheet to read and write. More on Sheet Updater node here.
In your main flow you should wrap the subflow in an Error Shield, that way if anything fails for one off URLs they can be skipped without halting the entire flow

Eg setup: https://www.gumloop.com/pipeline?workbook_id=8ruwLErkGcyUf7XbVmWiEZ

Let me know if this makes sense and works for you

BigGummiePlans · May 15, 2025, 6:46am

Thanks Wasay.

First recommendation was correct, removed duplicate, all runs execute.
However, the real underlying issue is unsolved.
Recapping :
I have URL as inputs, to be scraped.
If I manually input each URL one-at-a-time into my sub flow - each runs successfully, and returns results.
When i enter the same URLS as a list, and use primary flow - the runs complete, but website scrapes randomly collect nothing.

Here is the output list, showing the results of the extracts in both individual runs, list runs (x2).

From looking at the runs, website scraper seems to have collected nothing in some instances.

I have adjusted the time out, but it seems to not help.

Workbook (Shared) : https://www.gumloop.com/pipeline?workbook_id=dSV9q7fDmaz4jZTED6WnJj&run_id=W9nL6M3LPc7jAEBxezfWKC&tab=1

Results

One more thing, I just did a input list, as a trigger, new row add, and added each URL about 2 mins apart and got a perfect result. See sheets in screen shot, link here PLAN URL - Google Sheets

me thinks there is an issue with parallel processing website scrapes at the same time, perhaps its memory overhead, container time.

Wasay-Gumloop · May 16, 2025, 1:08am

Appreciate the detailed context. I think you’re right, the site seems to be blocking the scraper. I’ll flag this with the scraper provider.

system · May 21, 2025, 1:08am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Process that we are using for 6 months now keep getting stuck Bug Extract-Data , Website-Scraper , General	3	25	June 5, 2025
Help me understand why this flow is failing? How do I resume from where I left off? Get Help Website-Scraper	3	27	February 21, 2025
Getting error while running website scrape Get Help Website-Scraper	7	83	February 24, 2025
Writing Product Reviews — Website Scraping Error Get Help Website-Scraper , Drive-File-Reader	7	87	February 4, 2025
The experience has been incredibly frustrating Get Help Website-Scraper	6	120	March 26, 2025

Website Scraper Failures when Looping, but perfect when individual flows are run

Related topics