I’m wondering if anyone else is having issues hitting tpm rate limits. Part of one of my flows requires passing quite a large number of tokens through an API call. Even after implementing some token efficiency processes, I still find myself hitting rate limits, specifically the tpm limits.
Looking forward, I intend on making my tool available for public use - when that happens, I need a way to mitigate hitting tmp rate limits - like some kind of queueing system. I haven’t found any node that could do the trick (Gumloop team, this could be really handy!). I may be able to utilize a Run Code node for this somehow - before I go down that road I’m wondering if anyone else has this same issue and any possible solutions.
Hey @MarcMPSGC! If you’re reporting an issue with a flow or an error in a run, please include the run link and make sure it’s shareable so we can take a look.
Find your run link on the history page. Format: https://www.gumloop.com/pipeline?run_id={{your_run_id}}&workbook_id={{workbook_id}}
Make it shareable by clicking “Share” → ‘Anyone with the link can view’ in the top-left corner of the flow screen.
Provide details about the issue—more context helps us troubleshoot faster.
Hey @MarcMPSGC – Just to clarify, are you using an Ask AI node with your own API key? And is that where you’re hitting the rate limit? If so, then yes, a run code node or a custom node where you can include a Python query to handle batching would be the best route.
OpenAI has some guides that are really helpful for this, and I’ll link one below in case you want to reference it when writing your Python code. They specifically recommend using exponential backoff or batching requests through their batch API.
Let me know if either of those options works for you or if you run into issues. You can also paste those OpenAI docs into an LLM to help generate the code for your run code node.
I agree that a batching node would be really useful, I’ll add that to the roadmap.
Yes, I’m using various Ask AI nodes with my own API keys. Predominantly using OpenAI API keys but also Anthropic, Grok and Perplexity.
Yes, this is where I’m hitting rate limit issues. Currently it’s the “token per minute” rate limit that is the issue.
Thanks for the resource! I’m going to look into Exponential Backoff.
I’ve come up with an interesting solution I’m going to try out that doesn’t require code >> Using Google Sheets Read/Write to create a “Queue” where a column acts as a Status that I can use to gate the flow. For example, I write to the sheet before the Ask AI node and give it a status of “Processing”. I use this to halt other Runs. Once the Ask AI node completes, I update the Status to “Complete” which allows the next Run to proceed.