Webscraping multiple Trustpilot pages for reviews

Having no issue scraping the first page of a company’s Trustpilot, but can’t workout how to get Gumloop to click onto the next page and scrape that page too. The goal is to to have Gumloop scrape, say the first 10 pages, and then feed that into an LLM for cleaning and categorisation. Here’s the error I’m getting which is not solved by the offered solution: "Web Agent Scraper Failed! The element to be clicked is covered by another element.

Consider adding a ‘wait’ action before this click, or add a ‘screenshot’ action to check if there is a popup or overlay blocking the element."

Potentially an easier solution is to update the trustpilot URL to include ‘?page=2’ once the first page has been scraped, then ?page=3 and so on. Running into an issue where I can’t work out how to do this. Advice massively appreciated!

Have added an input flow which appends my specified ?=page2 input with the URL from the scraper, which works, but then I need to repeat the steps for as many pages as I want to scrape. If there’s a more elegant solution with fewer steps then I’m all ears.

Hey! You can use a Split Text node along with a Combine Text node to dynamically append page numbers to the URL. Here’s an example: https://www.gumloop.com/pipeline?workbook_id=weAk8EFFvjuVJ3M1oWEauS&run_id=3TkgaMoxWqUQTSKAKQxiMC

Let me know if this is what you were looking for.

You’re a champ - thank you :pray:

So that creates a nice output of URLs with sequentially appended numbers at the end, but as far as I can see I can only feed one URL into the scraper. How’d I use this suggested method to scrape all of the required pages?

If you have a flow that works well for a single URL, you can use that as a subflow to loop over a list of URLs.

Subflow tutorial: https://vimeo.com/1052111235/cb7e3a446b
Subflow Docs: https://docs.gumloop.com/core-concepts/subflows

Appreciate it, Wasay

This topic was solved and automatically closed 4 days after the last reply. New replies are no longer allowed.