Hey, have you tried using the “Website Scraper” node with “advanced” mode toggled? Should be more reliable than the Browser Extension approach (for publicly available websites) to get the content in the first place. You can then connect this output to the “Extract Data node”
Hi @Arslan - thanks for taking the time to respond.
Yes. tried; didnt work for some reason; Scaraper → Data Extract note → spreadsheet; but output was blank.
Hey @learngumloop, no worries, will try to look into this now. I’m new to the flow building as well, so might take some time, but Gumloop is a powerful tool so shouldn’t be too hard. Will update as soon as I have something working
@Arslan - Thank you. Its working, however, it seems its limiting to 18-20 rows max; what can we do to simulate the functionality of Page Down or Show More and extract the full list ? ( not just the top 10 or 20 ) ?
That’s strange @learngumloop, I was able to extract 68 entries with this flow, might be due to the model not being deterministic? How many do you need?
@Arslan - I am just learning GumLoop tool; so want to simulate the entire page load of ProductHunt till the end of the list; as PH is a good example directory listing. ( I am aiming to learn the extraction methods here - from completeness perspective - so the job finishes with full intended dataset )
gotcha @learngumloop I’m trying to scroll down manually but it seems almost like an infinite (or very very large) load, so not sure if you’d be able to get all the listings here just due to the sheer size of the page and scrolling required. Even if you were to use actions, it would probably time out due to the time it takes to scroll until “the end”
i believe it’s all the launches for that year on that page if you keep scrolling (40,000+ launches). That would probably time out any agent/web scraper
@Arslan - I was trying to compare this feature with other tools like RoboMotion or RTILA - which have a NextPage or LoadMore ButtonClick function, along with a wait clock; however those tools are quite hard to work with so was just hoping gumloop would of some help here.
That said, I consider the original help request to be Solved for sure. Thanks for the collaboration.