Reddit Blocking Scraper Requests — How to Bypass for v.redd.it Video Extraction?

GUWLOOP · July 11, 2025, 5:53am

I literally just had this working yesterday but it looks like they’ve made a recent update.

I’m building a workflow to extract direct .mp4 links from Reddit-hosted videos (e.g., https://v.redd.it/abc123). These videos are served through a DASHPlaylist.mpd file, which contains the signed video URLs in <BaseURL> tags.

However, when I try to scrape https://v.redd.it/{id}/DASHPlaylist.mpd using Gumloop, I often receive an AccessDenied XML error or get blocked entirely. I suspect this is due to Reddit’s bot detection or header requirements.

Has anyone successfully scraped Reddit .mpd files in Gumloop? Specifically:

Can I spoof User-Agent and Referer headers inside a standard content or source scraper node?
If not, can I pass that request to a custom Python node to fetch and return the valid .mp4 link?
Any best practices for avoiding bot detection when working with Reddit media?

Thanks; would love to hear if anyone’s built a clean solution around this.

Gumloop_Bot · July 11, 2025, 5:53am

Hey @GUWLOOP! If you’re reporting an issue with a flow or an error in a run, please include the run link and make sure it’s shareable so we can take a look.

Find your run link on the history page. Format: https://www.gumloop.com/pipeline?run_id={{your_run_id}}&workbook_id={{workbook_id}}
Make it shareable by clicking “Share” → ‘Anyone with the link can view’ in the top-left corner of the flow screen.
Provide details about the issue—more context helps us troubleshoot faster.

You can find your run history here: https://www.gumloop.com/history

GUWLOOP · July 11, 2025, 6:46pm

https://www.gumloop.com/pipeline?workbook_id=tjSEechGHq8JXY9ddSVS4p&tab=3

Wasay-Gumloop · July 12, 2025, 3:16am

Hey @GUWLOOP – Yes this should be possible. My main question is how are you actually inputting these URLs? Is it manual, or do you already have a Google Sheet or a database of these URLs that you just want to download? If that’s the case, a simpler way than scraping and dealing with bot protection is to first upload the .mpd Reddit link to Drive, then read from the same Drive using Google Drive file writer and Google Drive file reader nodes. That gives you the file object, which you can then use however you want – upload it somewhere else, send it on Slack, or attach it to an email. It all depends on what you’re trying to do and what your inputs are.

If you do want to go down the scraping route, you should also be able to do it with a run code node or a custom node.

I’ve set up an example below using your Reddit video link and then send it as a Slack message using the Drive approach I mentioned.

https://www.gumloop.com/pipeline?workbook_id=mKrNExEv7ZZnSnLoo3eAPE&tab=6

GUWLOOP · July 12, 2025, 6:34am

Thanks, Wasay really appreciate the help.

To clarify, I’m not inputting URLs manually. I’m pulling Reddit post URLs from a Google Sheet. From there, I download the video file from the corresponding packaged-media.redd.it .mp4 link (with e= and s= parameters), and then write the link to that downloaded file back into the Sheet for reference.

I like your solution and it works well when the .mp4 link is hardcoded. Unfortunately, it breaks when the link is populated dynamically via the Reddit scraper. That’s where I’m running into issues.

This setup worked fine until recently. Reddit now seems to be blocking access to those media links at the network level. I’m getting a “blocked by network security” or Access Denied error, even though the link structure hasn’t changed.

For example, when I used your solution with a dynamically scraped .mp4 link, I got this error:

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
  <Code>AccessDenied</Code>
  <Message>Access Denied</Message>
  <RequestId>HY7NTQTBWG5RP7P5</RequestId>
  <HostId>qpVKTn7IBfgBDh0Gnmcu92ncwJBzPHGFXb0BarcoEfUN598ylubYMWH5AnjcTGvGfJ7YXs83XC8kh8RyqQYoiFafJFBmF+bkkqGZIH2m6BQ=</HostId>
</Error>

I actually got it working again for a short window yesterday, so I’m guessing Reddit pushed a new bot-detection rule or patched something shortly after.

Is there any way to simulate a full browser environment (or pass the right headers/cookies) within a run code node to bypass this? Or do we need to rethink how we’re handling the fetch entirely?

Let me know if it helps to share my data flow or Sheet setup happy to dive deeper. Thanks again.

Wasay-Gumloop · July 14, 2025, 2:58am

Hey @GUWLOOP –

Unfortunately, it breaks when the link is populated dynamically via the Reddit scraper. That’s where I’m running into issues.

Do you have the run link for this that you can share please? You can find the run link from the Previous Runs Tab:

Is there any way to simulate a full browser environment (or pass the right headers/cookies) within a run code node to bypass this?

Yes, you should be able to do this with a Run Code node and a Python script.

Also, ultimately, what do you want to store on the Google Sheet, is it going to be the saved file link from Drive?

GUWLOOP · July 15, 2025, 5:45am

Thanks again for your help, here’s the flow to recreate the error: you’ll see the subflow wont fire because of the block issue:
https://www.gumloop.com/pipeline?workbook_id=4HoZWbp46GhkcKjKgGkr31&tab=10&run_id=7mrBrvgRYiF5ih8BporGzX

Wasay-Gumloop · July 17, 2025, 2:01am

Thank you for sharing. I see the issue now. The issue is not related to bot protection or anything like that. It’s just that the video link is not actually available in the posts for the AI to extract. If you take a look at the link that it is extracting, it is the link to the post and not the link to the video.

Example of video link: https://v.redd.it/{id}/DASHPlaylist.mp4

Example of links extracted by AI in your run: https://v.redd.it/a8oznsc0lvcf1

GUWLOOP · July 17, 2025, 3:00am

Wasay, thanks again with your assistance in this manner! I’ve figured out a workaround where im using run code to generate a list and manually extracting the url in a spreadsheet as a work around.

I see what you mean by the link but even instructing the ai differently shows that reddit is blocking at some level unless I’m misunderstanding. I also have these files directed toa shareable google drive folder so it shouldn’t be an issue.

Wasay-Gumloop · July 18, 2025, 4:39am

I understand what you mean now, and I appreciate your patience. It does seem like they’re blocking any requests to view the video in the browser, except for the original video you shared. I’m not sure there’s a reliable workaround for this, but if you want to explore bypassing bot protection or try downloading through the browser, you could consider using a Run Code node. For links that are blocked in this way, though, I don’t think the drive uploading method I mentioned earlier would work.

system · July 21, 2025, 4:40pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Trouble running loop Get Help Reddit-Scraper	6	46	February 26, 2025
SubReddit - problem General Question Reddit-Scraper	3	32	February 11, 2025
Reddit Scraper Interface Issue (Based On Gumloop Template That Also Doesn't Work) Get Help Reddit-Scraper	4	31	June 2, 2025
Not extracting the actual social media link that is on the webpage Get Help Extract-Data	8	64	February 21, 2025
I am scrapping Reddit for product ideas. But I am having error in node before AskAi. I need help Get Help General	2	17	February 14, 2025

Reddit Blocking Scraper Requests — How to Bypass for v.redd.it Video Extraction?

Related topics