Crawling outside the domain


Even though I made sure the node is only run for same domain,
the Run log showed it is crawling other URLs, and it was taking too long even for a single depth.

Hey @shinobi - Can you share the run link from the https://www.gumloop.com/history page please? Please also set the share access to ‘anyone with the link can view’ under the share button.

As for speed, the Website Crawler is more in depth hence slower but the Web Agent Scraper with the action Get all URLs is faster. It outputs the file URLs separated by a comma so you can use a Split Text node to get a list of all the URLs.

Here’s an eg: https://www.gumloop.com/pipeline?workbook_id=qwHFcSusCrk7QMZNgotwND&run_id=QwoDMWNhefEKzhJmQGEkbj

1 Like

Thanks Wasay, actually I got confused by the “Use only Same domain” flag.
It was actually turned off. I turned it on and it was working as expected.

https://www.gumloop.com/pipeline?workbook_id=2F8UxyKQyMnLAbkmHNgRjp

since you asked, here is my workbook link anyway.

1 Like

Awesome, glad you were able to solve it!

Would recommend looking into subflows + error shield to make this flow more robust: https://docs.gumloop.com/core-concepts/subflows

This topic was automatically closed 4 days after the last reply. New replies are no longer allowed.