Product Hunt yearly LeaderBoard to Google Sheet

I’m trying to scrape the Product Hunt yearly leaderboard and save the data to Google Sheets. 2023 | Product Hunt

My current flow uses Browser Extension Input but keeps timing out. Here’s my flow link: https://www.gumloop.com/pipeline?run_id=2oCFiGm5Wgxt6N3YBNp46B&workbook_id=tizj2qn9JMdg5k3jM4usks

I need help to build a flow:

  1. either with the Browser Extension approach or with Website Scraper
  2. Ensuring proper connection to Extract Data node
  3. Maintaining the current Google Sheets output format

Would appreciate any guidance on getting this flow working; or if you have a flow that is already working, please link below.

Hey, have you tried using the “Website Scraper” node with “advanced” mode toggled? Should be more reliable than the Browser Extension approach (for publicly available websites) to get the content in the first place. You can then connect this output to the “Extract Data node”

Hi @Arslan - thanks for taking the time to respond.
Yes. tried; didnt work for some reason; Scaraper → Data Extract note → spreadsheet; but output was blank.

Hey @learngumloop, no worries, will try to look into this now. I’m new to the flow building as well, so might take some time, but Gumloop is a powerful tool so shouldn’t be too hard. Will update as soon as I have something working

1 Like

@learngumloop this is working on my end: https://www.gumloop.com/pipeline?workbook_id=3P7BHKkbj3YFJ3SNx6Bbie

1 Like

@Arslan - Thank you. Its working, however, it seems its limiting to 18-20 rows max; what can we do to simulate the functionality of Page Down or Show More and extract the full list ? ( not just the top 10 or 20 ) ?

That’s strange @learngumloop, I was able to extract 68 entries with this flow, might be due to the model not being deterministic? How many do you need?

@Arslan - I am just learning GumLoop tool; so want to simulate the entire page load of ProductHunt till the end of the list; as PH is a good example directory listing. ( I am aiming to learn the extraction methods here - from completeness perspective - so the job finishes with full intended dataset )

hope that explains my learning objective!

gotcha @learngumloop I’m trying to scroll down manually but it seems almost like an infinite (or very very large) load, so not sure if you’d be able to get all the listings here just due to the sheer size of the page and scrolling required. Even if you were to use actions, it would probably time out due to the time it takes to scroll until “the end”

i believe it’s all the launches for that year on that page if you keep scrolling (40,000+ launches). That would probably time out any agent/web scraper

1 Like

@Arslan - I was trying to compare this feature with other tools like RoboMotion or RTILA - which have a NextPage or LoadMore ButtonClick function, along with a wait clock; however those tools are quite hard to work with so was just hoping gumloop would of some help here.

That said, I consider the original help request to be Solved for sure. Thanks for the collaboration.

This topic was automatically closed 60 minutes after the last reply. New replies are no longer allowed.