Crawling outside the domain

shinobi · February 28, 2025, 5:39am

Even though I made sure the node is only run for same domain,
the Run log showed it is crawling other URLs, and it was taking too long even for a single depth.

Wasay-Gumloop · February 28, 2025, 10:48pm

Hey @shinobi - Can you share the run link from the https://www.gumloop.com/history page please? Please also set the share access to ‘anyone with the link can view’ under the share button.

Wasay-Gumloop · February 28, 2025, 10:50pm

As for speed, the Website Crawler is more in depth hence slower but the Web Agent Scraper with the action Get all URLs is faster. It outputs the file URLs separated by a comma so you can use a Split Text node to get a list of all the URLs.

Here’s an eg: https://www.gumloop.com/pipeline?workbook_id=qwHFcSusCrk7QMZNgotwND&run_id=QwoDMWNhefEKzhJmQGEkbj

shinobi · March 1, 2025, 4:14am

Thanks Wasay, actually I got confused by the “Use only Same domain” flag.
It was actually turned off. I turned it on and it was working as expected.

shinobi · March 1, 2025, 4:18am

https://www.gumloop.com/pipeline?workbook_id=2F8UxyKQyMnLAbkmHNgRjp

since you asked, here is my workbook link anyway.

Wasay-Gumloop · March 1, 2025, 5:37am

Awesome, glad you were able to solve it!

Wasay-Gumloop · March 1, 2025, 5:38am

Would recommend looking into subflows + error shield to make this flow more robust: https://docs.gumloop.com/core-concepts/subflows

system · March 5, 2025, 5:39pm

This topic was automatically closed 4 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Web Crawler hanging Get Help Website-Crawler	5	20	August 20, 2025
Web Scraper Bugs Get Help Website-Scraper	5	50	March 23, 2025
Website Scraper Failures when Looping, but perfect when individual flows are run Get Help Website-Scraper	5	26	May 21, 2025
This run seem to be successful but i got error message in my spreadsheet Get Help Google-Sheets-Writer	25	69	March 9, 2025
Web scrapers returning the same data for different URLs Get Help Website-Scraper	5	33	August 1, 2025

Crawling outside the domain

Related topics