The Token Bucket Rate Limiting algorithm implements rate limiting by maintaining a fixed-capacity bucket that holds tokens. Tokens are added at a consistent rate, and their count never exceeds the bucket’s capacity. To process a request, the algorithm checks for sufficient tokens in the bucket, deducting the necessary amount for each request. If insufficient tokens are available, the request is rejected.
The above diagram illustrates the algorithm with a rate limit set to one request per second and a bucket capacity of two tokens. When request A arrives, the token bucket is initialized with one token, enabling request A to proceed successfully. Request B is rejected due to a lack of tokens in the bucket at that moment. Since no requests are made between the first and second seconds, the bucket accumulates two tokens. Consequently, requests C and D are allowed, with each consuming a token. Request E is rejected as the bucket has been run out of tokens.