For the web scraping node, can we extract the actual HTML contents from the page? I need to scrape the image tag, and button tags from the page to determine some links on the page. I think as the current web scraping nodes stand, they only grab the text content.
Grabbing profile picture based on img tags on the page
@shawnbuilds do you mind elaborating what you are trying to do? If this is some therapist directory most certainly the are optimizing their SEO using sitemap. So if you share the root url we can probably find a better way of extracting the same with much simpler techniques. I am guessing you want to grab all the therapist image and the url?
I am able to do it in python but I don’t understand why we can’t do it in the scrape source https://www.gumloop.com/pipeline?workbook_id=wj9uLit6jYjgPYwK8XaAU4. Ideally we should be able to traverse a dom elements without spending anything on the llm credits. I also think this might be a feature worth adding don’t run the LLM content on the whole page source specially when we have a lot of page with heavy html elements(wasting tokens). Let the end user define the scope of the html element and then run llm on top of those for extracting data