Hello Gumloop Community,
I’m working on an automation flow to scrape doctor profiles from a paginated listing page on Gumloop, but I’m running into some issues with pagination in my custom “Page Navigator” node .
My Setup:
• I have a working automation flow that:
-
Extracts all profile links from a listing page.
-
Scrapes each profile individually and extracts details (name, specialty, address, phone, email).
-
Saves the extracted data into Google Sheets.
• The problem is the listing page URL does not change when navigating pages (AJAX-based pagination).
• I added a “Page Navigator” custom node to click the “Next Page” button and navigate through all pages.
Issues I’m Facing:
Pagination is not working correctly.
• My Page Navigator executes, but it doesn’t move beyond Page 1 or keeps extracting the same page.
Gumloop expects a list but receives a single URL input.
• The input node provides a single listing page URL, but the Page Navigator expects a list.
Potential bot detection issues (reCAPTCHA appearing randomly).
• Sometimes, instead of loading new profiles, the page returns a Google reCAPTCHA in the extracted content.
What I’ve Tried So Far:
Converting the input URL into a list using a List Operations node.
Modifying the Page Navigator script to:
• Click “Next Page” only if visible.
• Wait for new content to load before proceeding.
• Stop pagination when no more pages exist.
Checking the output logs to verify if new data loads after clicking “Next”.
Trying to detect AJAX-loaded content before moving to the next step.
What I Need Help With
How can I ensure that my Page Navigator correctly moves through pages and loads new results?
What’s the best way to detect if a new page has loaded (instead of extracting the same data repeatedly)?
How can I prevent reCAPTCHA from blocking my automation?
Should I handle AJAX-based pagination differently in Gumloop?
Any insights, suggestions, or best practices would be greatly appreciated! Thanks in advance for your help.
Link to my workbook: https://www.gumloop.com/pipeline?workbook_id=daLprHVz33GRxZjzM67HJR&run_id=9GLm4gh5eTR7NY4AYNJghJ