Saturday, October 18, 2014

Rate Control/Traffic Policing/Traffic Shaping

Again, nothing new and something quite well known in the networking literature (maybe networking 101 for some) but putting it out here just as a reference because i think this is used pretty commonly in the networking industry and folks should be aware about this.

Let's say the goal is to rate control events (an event can be anything like number of hits, or number of clicks or say number of bytes delivered etc.) so that incoming traffic can be handled without overwhelming one's servers. The most commonly used (in my experience) algorithm for rate control (shaping, policing) is the token-bucket or leaky bucket based on one's requirements. However, one may think about trying to use other algorithms e.g. a sliding window. Here are the pros and cons of each approach:

  1. Sliding Window: In this approach, we specify the rate in the sliding window that we will accept .e.g B events in the window, and anything in excess will be dropped. This approach has the problem as shown below that it cannot handle burst properly. Especially in a pathological case, this algorithm has the risk of shaping the traffic incorrectly over a long duration. As one can see in the picture below, one reason burst could not be handled well was because our sliding window slid 1 slot at a time. If instead we slide a random amount, we have effectively constructed a "randomized algorithm", that would not be 100% accurate rate control but which will be on average expected to be pretty close to the expected rate. Further, one can combine the sliding window approach with the token bucket approach as outlined below. 
  2. Leaky Bucket algorithm: In this algorithm, a "bucket" data structure is created, which can be implemented as a queue and packets are inserted into the queue when they arrive, and are taken out at a predefined fixed rate. If an egress event occurs and there is nothing in the queue, we say that an "underrun" has occurred. If there are too many packets in the queue, specified by a size of the queue, we say that an "overrun" has occurred. The question of how big a burst size can be is configurable and is called as the sustainable cell rate (or sustainable bitrate, or sustainable event rate). [Ref: Wikipedia, Image credits, Wikpedia]. This is useful as a jitter buffer i.e. a buffer to eliminate jitter i.e. packets/events come in at whatever rate , but they will go out at a fixed rate. 
     
  3. Token Bucket algorithm: In this approach, tokens are generated at a constant rate (or whatever rate one wants), and then arriving events/data is queued up and then when a packet is supposed to be dequeued (See *). When the packet is dequeued, number of events associated with that packet are pulled out of the token bucket. [Image credits: this]. * Typically there is some logic for dequeuing, e.g. (A) when trying to dequeue media packets that have a timestamp .e.g. a PTS in a MPEG2 stream, the packets will sit in the queue till the system wall-clock time/timestamp reaches the PTS of the packet (B) Sometimes networking queues are implemented as simply list-driven e.g. reorder queue: one can keep a counter for the expected packet count, and keep egressing the packets that arrive in order. When a packet arrives out of order (greater than the one expected), the packets are queued up in a sorted queue. When the expected packet arrives, elements in the queue that are in order are dequeued (and passed to token check stage). Note that the token bucket algorithm can also run into problems with bursty traffic. One may need to have a token debt- algorithm to deal with certain scenarios (http://www.google.com/patents/US20110075562 where if the token count becomes negative, the egress is stalled till the count becomes positive again. This is good for shaping. What about Policing using token bucket ? That can be handled by having a lower bound for the debt. Our rate control/policer can only work in those bounds of debt. For policing, there could be a long-term debt checker, which if non-zero guarantees that our algorithm is token-non-negative, which means that the given stream is within bounds. If not, we drop. 
  4. Hybrids: one can devise hybrids e.g. leaky bucket + token bucket, where the data incoming out of a leaky bucket comes at a fixed rate, and goes into the token bucket queue at that fixed rate and rate control and policing can be done at the outgoing packet level. For Policing, The Sliding window approach can also be combined with token bucket, where some number of tokens are generated every second, say B tokens/sec, so that way the burst shown in that example will go through, but had there been another burst (3rd), then that would be dropped. Thus, we would need to have negative debt tokens as described in the point 3 above.
Of course, when implementing these, one needs some kind of timer mechanism, and one can use the commonly used libevent to create timers using evtimer_new(), and evtimer_add() etc. Note that libevent's timing mechanism is heap based where insertions and deletions are O(lgn).  (see minheap-internal.h)

One can of course, also implement timers using timing wheels: e.g. this http://25thandclement.com/~william/projects/timeout.c.html, where the insertions and deletions are O(1). 

No comments:

Post a Comment