What is API throttling?

API throttling is the process of limiting the number of API requests a user can make in a certain period. An application programming interface (API) functions as a gateway between a user and a software application. For example, when a user clicks the post button on social media, the button click triggers an API call. This API interacts with the web server of the social media application and performs the action of posting. This user could be a human or another software application.

Organizations use API throttling with various goals, such as security, scalability, performance, monetization, authentication, and availability.

A real-life example of API throttling in business

Assume that a person is searching for a flight through an OTA (online travel agency)site. The OTA website collects information from the user including origin, destination, and date of travel. Then it uses APIs to fetch the flight information from the GDS (Global Distribution system) like Sabre or Amadeus.

Why do businesses need API throttling?

APIs are one of the biggest assets of organizations. APIs help the users of a website or mobile applications fulfil their tasks. As the number of users increases, the websites or the mobile application starts showing the signs of performance degradation. As a result, users with better connections or faster interfaces might get a better experience than others. API throttling is an elegant solution that helps organizations to ensure fair use of their APIs.

API throttling also helps to fight back denial of service (DoS) attack, where a malicious user sends enormous volumes of requests to bring down a website or a mobile application. As the number of online users increases, businesses need to implement API throttling mechanisms to ensure fair usage, data security, and prevent malicious attacks.

How does API throttling work?

While there are various algorithms for API throttling, here are the basic steps in any API throttling algorithm:

  1. A client/user calls an API that interfaces with a web service or application.
  2. The API throttling logic checks if the current request exceeds the allowed number of API calls.
  3. If the request is within limits, the API performs as usual and completes the user’s task.
  4. If the request exceeds the limit, the API returns an error response to the user.
  5. The user will have to wait for a pre-agreed time period, or pay to make any more API calls.

What are the major API throttling algorithms?

Leaky bucket API throttling algorithm

This algorithm uses a first-in, first-out (FIFO) queue to hold the incoming requests. The queue will have a specific size. When a new API call/ request is received, it is added to the end of the queue. At regular intervals, this algorithm removes a request from the front of the queue and processes it. If a new request comes when the queue is already full, the request is discarded. This algorithm is closely related to the token bucket algorithm.

Advantages of leaky bucket

  • Easy to implement
  • Processes requests at a constant rate. Even if there is a burst in requests, the system isn’t overloaded. In a way, the leaky bucket algorithm smoothes out the output flow when there is an uneasy input flow.

Disadvantages of leaky bucket

  • As the leaky bucket algorithm uses a FIFO queue, there is a chance of starvation. It means that when a queue is full and when a request takes more time to process, the newer requests may get discarded. This problem arises due to the order in which the requests are processed.

Fixed window API throttling algorithm

The fixed window allows N number of API calls from a user in a particular period. For example, a fixed window algorithm allows two requests per minute. The time frame is divided into fixed frames, each of a minute duration. At the starting of a minute, a counter is set to zero. With every user request, the counter increases. If the counter reaches the upper limit before the time window ends, new requests are rejected. At the beginning of every minute, the counter resets to zero.

In a typical implementation of a fixed window algorithm, each user will have a unique key and a counter associated with the key. At the beginning of the fixed time window, the counter is reset.

Advantage of fixed window

  • Unlike the leaky bucket algorithm, a fixed window algorithm will not lead to the starvation of new requests as the counter resets at the beginning of every time window.

Disadvantage of fixed qindow

  • At the beginning of the time window, there could be a burst in the user requests. For example, if there is a 1000 request/hour limit, all the 1000 requests might be made at the first minute of the window. This might overwhelm the system.

Sliding window API throttling algorithm

This algorithm solves the request-burst issues with the fixed window algorithm by starting the time window when a request is made. For instance, it assumes that the system only allows two requests per minute for the user. Unlike the fixed window, the time window starts only when a user actually makes the first request. The timestamp of the first request is saved with a counter, and the user is allowed to make one more request within that minute.

Advantages of sliding window algorithm

The sliding window algorithm combines the advantages of the leaky bucket and fixed window algorithm. It eliminates the issues with both the other algorithms. In the sliding window, newer requests don’t starve. Unlike the fixed window, the burst of requests doesn’t overwhelm the system.

What are the benefits of API throttling?

API throttling is a technique essential to all organizations that expose their services through APIs.


API throttling prevents the degradation of system performance by limiting the excess usage of an API. If an application has millions of users, a system might get a huge number of API requests per second. Servicing all those API requests will slow down the system and affect its performance. API throttling ensures that every user receives the performance ensured in the service level agreement (SLA).


An API throttling system acts as a gateway to an API. It helps to prevent the denial of service (DoS) attacks. In DoS, an attacker issues a massive number of service requests so that the service becomes unavailable to the legitimate users. By limiting the total number of service requests, API throttling helps to prevent DoS attacks.

Reduce unintended/malicious use

In case an API gives out sensitive information due to a technical glitch, APO throttling will limit users getting unauthorized access to data through the compromised API.

Metering and monetization

APIs are one of the biggest assets of organizations. Monetizing the API usage contributes a significant share to their profit. API throttling helps organizations to meter the usage of their APIs. For example, a web service may offer 1000 free API calls per hour, but if users need more requests per hour, they have to pay for it.


API throttling does not necessarily only limit the number of calls. Depending on the access privilege of a user, API throttling logic will allow them access to the selected parts of the API. For example, based on the authority of the requester, some users may be able to look up other users, and others might be able to edit details of users through the API.

What are the challenges of API throttling?

Implementing API throttling in a distributed system is a challenging task. When the app or service has multiple servers worldwide, the throttling should be applied for the distributed system. The consecutive requests from the same user could be forwarded to different servers. The API throttling logic resides on each node and needs to synchronize with each other in real-time. It could lead to inconsistency and race conditions.

In the below example, the user has already used up four requests out of the five requests per second limit. Assume the user issues two more requests at the same second. They go to two different servers. The rate limiter retrieves the current counter from the database and sees count=4; it allows the call. At the same time, the second rate limiter also pulls the data from the database and sees the count=4. It happens because the second rate limiter pulls the data from a common database before the first rate limiter updates the count. Hence, both the requests are serviced, and the user gets 6 requests/second.

Solutions: API throttling systems employ multiple solutions to inconsistency and race conditions. One way to implement API throttling in distributed systems is to use sticky sessions. In this method, all requests from a user are always serviced by a particular server. However, this solution is not well-balanced or fault tolerant.

The second solution to API throttling in distributed systems are locks. In the above example, when the first rate limiter accesses the database, it locks the count. Until it unlocks the count, the second rate limiter will not be able to access the count. The first rate limiter unlocks the count when the counter is updated to five. Another popular solution is relaxing the rate limit. This solution allows a certain percentage of extra requests per second.

API throttling diagram