I have a flow to take a URL, scrape data, AI analyze, and output back into sheets.
When the URL are input one-at-a-time, manually into the flow, everything works perfectly.
When the URL’s are read from google sheet, some scrapes fail. Repeated runs give different URL failures, so it feels random.
I increased timeout on website scraper and wouldn’t say that changed the result all that much.
I reverted back to individual input, and again, it works fine.
Hey @BigGummiePlans! If you’re reporting an issue with a flow or an error in a run, please include the run link and make sure it’s shareable so we can take a look.
Find your run link on the history page. Format: https://www.gumloop.com/pipeline?run_id={{your_run_id}}&workbook_id={{workbook_id}}
Make it shareable by clicking “Share” → ‘Anyone with the link can view’ in the top-left corner of the flow screen.
Provide details about the issue—more context helps us troubleshoot faster.
If you click on the subflow runs that failed on the main flow where URLs are being pulled from the Sheet Reader node you’ll notice that on each failed run the Duplicate node failed, this is because the Extract Data node could not extract relevant data for a specific field and output a blank string, in which case the Duplicate node did not have any input to duplicate.
I’m not exactly sure why a Duplicate node is needed here, I’m assuming it was added to meet the List input when Writer Mode on the sheet writer node is set to Write to Column. A quick and easy solution would be to simply remove the duplicate nodes and change the writer mode to Add a Single New Row – this will allow you to write empty/blank strings:
A more robust solution here though would involve two edits:
In your subflow you should use a Google Sheet Updater node to find and update the existing row since you’re using a single spreadsheet to read and write. More on Sheet Updater node here.
In your main flow you should wrap the subflow in an Error Shield, that way if anything fails for one off URLs they can be skipped without halting the entire flow
First recommendation was correct, removed duplicate, all runs execute.
However, the real underlying issue is unsolved.
Recapping :
I have URL as inputs, to be scraped.
If I manually input each URL one-at-a-time into my sub flow - each runs successfully, and returns results.
When i enter the same URLS as a list, and use primary flow - the runs complete, but website scrapes randomly collect nothing.
Here is the output list, showing the results of the extracts in both individual runs, list runs (x2).
From looking at the runs, website scraper seems to have collected nothing in some instances.
I have adjusted the time out, but it seems to not help.
One more thing, I just did a input list, as a trigger, new row add, and added each URL about 2 mins apart and got a perfect result. See sheets in screen shot, link here PLAN URL - Google Sheets
me thinks there is an issue with parallel processing website scrapes at the same time, perhaps its memory overhead, container time.