Scraping Trustpilot reviews

Hi,
I am trying to make an automation that would scrape Trustpilot reviews for a brand that I insert into an Interface. I would insert https://www.trustpilot.com/review/brand.com.
The automation should check all the pages with the reviews (pagination). The output should go into a google sheet with predefined columns (Number of stars, Content of the review…)

There are several issues that I have come across - I have made a separate subflow just to produce URL’s with ?page=x I believe this works, although I can not figure out how I would be able to make the automation work so that I wouldn’t have to insert the URL into the Combine page Node but would just insert brand’s URL on the Trustpilot site (The inserted URL in the interface should be enough - https://www.trustpilot.com/review/brand.com)

The issue with the main flow that feeds from the output of the subflow (?page=x URL’s) is that it provides the same results in google sheet for all different URL’s (?page=x)-
Basically it feeds the final google sheet with the same content x-times.

I have looked into your guidelines for scraping and pagination https://www.gumloop.com/pipeline?workbook_id=weAk8EFFvjuVJ3M1oWEauS&run_id=3TkgaMoxWqUQTSKAKQxiMC
but I am still stuck :slight_smile: Any help would be really appreciated.
How can I share you my flow? - It would be easier to understand :slight_smile:
Best

Hey @Sas! If you’re reporting an issue with a flow or an error in a run, please include the run link and make sure it’s shareable so we can take a look.

  1. Find your run link on the history page. Format: https://www.gumloop.com/pipeline?run_id={{your_run_id}}&workbook_id={{workbook_id}}

  2. Make it shareable by clicking “Share” → ‘Anyone with the link can view’ in the top-left corner of the flow screen.
    GIF guide

  3. Provide details about the issue—more context helps us troubleshoot faster.

You can find your run history here: https://www.gumloop.com/history

Hi, here is the link - https://www.gumloop.com/pipeline?workbook_id=51H744KjkgmDkZrGG2DbtW

Hey @Sas – I believe you’re super close and have set the flow properly, the main issue that you’re facing is this correct:

The issue with the main flow that feeds from the output of the subflow (?page=x URL’s) is that it provides the same results in google sheet for all different URL’s (?page=x)-
Basically it feeds the final google sheet with the same content x-times.

Can you share the run link for this so I can view the inputs/outputs please? You can find the run link on the https://www.gumloop.com/history page or through the Previous Runs tab on the canvas.

Hi, thank you for answering.
Is it this?
https://www.gumloop.com/pipeline?run_id=bqfg8JBk9khQ57WuPtQcp2&workbook_id=51H744KjkgmDkZrGG2DbtW

https://www.gumloop.com/pipeline?run_id=5nxuHZRUusWAuRbLE8oFiX&workbook_id=51H744KjkgmDkZrGG2DbtW

Thanks! If you delete the Extract Data node and add it back, that should resolve the issue. The key is to disable the Extract List option to avoid getting a List of List output.

If you do want to extract a list from the Extract Data node, the best approach is to create a subflow with the Website Scraper, Extract Data, and Google Sheet nodes. Then, wrap that subflow in an error shield—this will make the flow more robust overall.

Subflow Tutorial: Subflow Tutorial
Subflow Docs: Subflow Documentation

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.