Need help to extract data from a website with CAPTCHA

VJB · February 26, 2025, 11:33am

I’m trying to automate the extraction of NAPLAN test results for schools from the MySchool website.

For example, here’s a sample URL:
https://www.myschool.edu.au/school/45423/naplan/results.

Current Approach

I’m using a Browser Replay with cookies method to capture the page content.

Problem

The captured screenshot only shows the loading page indefinitely.
I suspect the CAPTCHA mechanism is preventing access, despite using valid cookies.
I’ve tried increasing the wait time in the recording and revalidating cookies, but no success.

Questions

Has anyone successfully bypassed this type of CAPTCHA using cookies?
Is there an alternative method to retrieve these data?
Any insights on handling CAPTCHA in Gumloop.

Below is a screenshot of the automation flow and output for reference.

Output:

Workbook Link: https://www.gumloop.com/pipeline?workbook_id=djLB5D5zB89GD72T5RknCJ

Any help would be greatly appreciated! Thanks in advance.

Wasay-Gumloop · February 26, 2025, 4:35pm

Hey @VJB - Can you share what’s happening within the selected replay? Are you clicking accept and then proceeding to the page after that?

I’d say the broswer extension input node would be a better option here:
Doc: https://docs.gumloop.com/nodes/browser_extension/browser_extension_input
Tutorial: https://www.loom.com/share/6b343be195ba4a55a66ce26894b303f9

Let me know if that works for you.

VJB · February 27, 2025, 5:13am

Hi Wasay,

Inside the current replay I am just browsing the results page and scrolling down to the full page with enough wait time to load the page. I don’t need to accept because I have already done it once.

Browser Extension Input Node works manually no issues there. But I will need to do it manually for each School. I want it to be automated so that I could just provide links to the schools. Hope that makes sense.

Wasay-Gumloop · February 27, 2025, 12:13pm

I see, thank you for the info. Using the Browser Extension Input is the only option I can see right now due to the popup.

Can you share a few sample links of schools that you’d hope to provide as inputs so I can double-check and see if there’s a workaround?

I also dont see an API option which we can use to pull the data without scraping.

VJB · February 27, 2025, 12:57pm

Here are a few sample links –

Really appreciate your help.

Wasay-Gumloop · February 28, 2025, 10:52am

Thank you! Unfortunately, I don’t see an option to automatically bypass the terms page. Basically what you see in incognito for the URLs is what the scraper is able to scrape.

The Browser Extension Input would be the most straightforward (although I understand not ideal) solution here.

VJB · March 1, 2025, 12:51am

Thanks Wasay,
I guess I have to rely on the manual method for now.

system · March 5, 2025, 12:52pm

This topic was automatically closed 4 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Website Scrapper with Screenshot? + questions General Question Website-Scraper	7	116	April 28, 2025
Help Needed with Automating Pagination in Custom Node Get Help Website-Scraper	4	46	March 1, 2025
Booking Agent Node Get Help Ask-AI , Extract-Data , Website-Scraper , Web-Agent	3	49	March 16, 2025
Trying to scrape Gumloop forum Get Help Extract-Data , Website-Scraper	5	38	March 3, 2025
Access web automation node General Question Website-Scraper	2	21	February 17, 2025

Need help to extract data from a website with CAPTCHA

Related topics