I literally just had this working yesterday but it looks like they’ve made a recent update.
I’m building a workflow to extract direct .mp4
links from Reddit-hosted videos (e.g., https://v.redd.it/abc123
). These videos are served through a DASHPlaylist.mpd
file, which contains the signed video URLs in <BaseURL>
tags.
However, when I try to scrape https://v.redd.it/{id}/DASHPlaylist.mpd
using Gumloop, I often receive an AccessDenied
XML error or get blocked entirely. I suspect this is due to Reddit’s bot detection or header requirements.
Has anyone successfully scraped Reddit .mpd
files in Gumloop? Specifically:
- Can I spoof
User-Agent
and Referer
headers inside a standard content
or source
scraper node?
- If not, can I pass that request to a custom Python node to fetch and return the valid
.mp4
link?
- Any best practices for avoiding bot detection when working with Reddit media?
Thanks; would love to hear if anyone’s built a clean solution around this.
Hey @GUWLOOP! If you’re reporting an issue with a flow or an error in a run, please include the run link and make sure it’s shareable so we can take a look.
-
Find your run link on the history page. Format: https://www.gumloop.com/pipeline?run_id={{your_run_id}}&workbook_id={{workbook_id}}
-
Make it shareable by clicking “Share” → ‘Anyone with the link can view’ in the top-left corner of the flow screen.

-
Provide details about the issue—more context helps us troubleshoot faster.
You can find your run history here: https://www.gumloop.com/history
Hey @GUWLOOP – Yes this should be possible. My main question is how are you actually inputting these URLs? Is it manual, or do you already have a Google Sheet or a database of these URLs that you just want to download? If that’s the case, a simpler way than scraping and dealing with bot protection is to first upload the .mpd Reddit link to Drive, then read from the same Drive using Google Drive file writer and Google Drive file reader nodes. That gives you the file object, which you can then use however you want – upload it somewhere else, send it on Slack, or attach it to an email. It all depends on what you’re trying to do and what your inputs are.
If you do want to go down the scraping route, you should also be able to do it with a run code node or a custom node.
I’ve set up an example below using your Reddit video link and then send it as a Slack message using the Drive approach I mentioned.
https://www.gumloop.com/pipeline?workbook_id=mKrNExEv7ZZnSnLoo3eAPE&tab=6
Thanks, Wasay really appreciate the help.
To clarify, I’m not inputting URLs manually. I’m pulling Reddit post URLs from a Google Sheet. From there, I download the video file from the corresponding packaged-media.redd.it
.mp4
link (with e=
and s=
parameters), and then write the link to that downloaded file back into the Sheet for reference.
I like your solution and it works well when the .mp4
link is hardcoded. Unfortunately, it breaks when the link is populated dynamically via the Reddit scraper. That’s where I’m running into issues.
This setup worked fine until recently. Reddit now seems to be blocking access to those media links at the network level. I’m getting a “blocked by network security” or Access Denied error, even though the link structure hasn’t changed.
For example, when I used your solution with a dynamically scraped .mp4
link, I got this error:
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>HY7NTQTBWG5RP7P5</RequestId>
<HostId>qpVKTn7IBfgBDh0Gnmcu92ncwJBzPHGFXb0BarcoEfUN598ylubYMWH5AnjcTGvGfJ7YXs83XC8kh8RyqQYoiFafJFBmF+bkkqGZIH2m6BQ=</HostId>
</Error>
I actually got it working again for a short window yesterday, so I’m guessing Reddit pushed a new bot-detection rule or patched something shortly after.
Is there any way to simulate a full browser environment (or pass the right headers/cookies) within a run code node to bypass this? Or do we need to rethink how we’re handling the fetch entirely?
Let me know if it helps to share my data flow or Sheet setup happy to dive deeper. Thanks again.