Web Crawler hanging

Hello, I have a Tier 1 Web Crawler node that was working fine before but is now hanging on runs. I can see in the Node details that it’s having trouble crawling certain pages and skipping those pages, but winds up hanging which disrupts the Flow.

It was working fine this morning but now it hangs every time. It doesn’t “Fail” it just hangs in operation.

I’ve stopped the Flows but they are still stuck in “Running”.

@Wasay-Gumloop - I’ve gone ahead and given you visibility to the Flow as you usually respond quite quickly.

https://www.gumloop.com/pipeline?workbook_id=86r2apeNCuuomWVSDR9nLF

Thanks

Hey @MarcMPSGC! If you’re reporting an issue with a flow or an error in a run, please include the run link and make sure it’s shareable so we can take a look.

  1. Find your run link on the history page. Format: https://www.gumloop.com/pipeline?run_id={{your_run_id}}&workbook_id={{workbook_id}}

  2. Make it shareable by clicking “Share” → ‘Anyone with the link can view’ in the top-left corner of the flow screen.
    GIF guide

  3. Provide details about the issue—more context helps us troubleshoot faster.

You can find your run history here: https://www.gumloop.com/history

Hey @MarcMPSGC – I think this might be if the flow takes too long to run for certain websites, do you have an example site or run link that you can share as well?

Since you only want Depth 1 URLs, you can try using the ‘Web Agent Scraper’ node with the Get All URLs action, it would be much faster:

Let me know if that works :slight_smile:

Hi @Wasay-Gumloop , thanks.

It seemed to be a temporary issue that was happening, as I had run the same URL through my Flow without issue before and after the issue occured.

I’m interested in trying the Web Agent Scraper > Get All URLs function. I added it to my Flow but noticed that it doesn’t output a List of URLs. I’m sure I can figure out the logic for formatting the output into a list, but before I do, I was wondering what makes the Web Agent Scraper a better choice over the basic Web Crawler for this function.

Thanks

Hey @MarcMPSGC – The web agent scraper node is not necessarily a better choice compared to the Web Crawler, its just faster as it only gets the links from the same webpage.

As for your question, the Get All URLs action outputs the URLs separated by a comma, you can use the Split Text node to get a list of URLs, example setup: https://www.gumloop.com/pipeline?workbook_id=m2AEL4E39RXLMcmqvpi3Fu

Let me know if this makes sense and works for you :slightly_smiling_face: