Sunday, March 11, 2012

On Caches

This is quite a vast topic, because so much of what the work that is done in the Computer Science field is an extension of some-sort-of-cacheing by some entities. One can in fact look upon a major chunk of the computer science field itself as a search for "better" cacheing techniques. (the meaning of "better" changes with applications)

The simplest way of cacheing is of course, store data in a hash table with the key as the "address" and the value as the data. This kind of cacheing is called an associative array or hash based cacheing.

However, we can generalize this process a bit by trying to map the entire address space into a smaller address space. I will start off the discussion with the caches as one finds in Computer Organization books i.e. caching of data one finds in slower memories into faster memories. In such types of cacheing, there are 4 things of interest and we can arrange the 4 things (line-number, set, tag, and element-offset; to be discussed soon) in 4! or 24 ways, so there are at least 24 types of caching possible. I am saying at least 24 because one may use advanced data structures to implement caches which will scale up the total types of caches possible.

Organization:
See figures for an conceptual understanding. Caches are in reality not necessarily flat as shown here. The flat caches as ive drawn is perhaps the simplest way to explain it. I will briefly discuss 2 alternative ways in which cacheing of data from CPU addresses can be achieved.

Type 1: Flat Cache/Tags are part of the Cache
See the following picture. Lets say the main memory is 2^A * k bytes. We can envision the memory address to be broken up into x, y, z as follows. The "address-chunk" is referred to as an "element" in literature.

Type 2: Flat Cache/Note the difference in the tag bits.



Storage Benefits
:
Cacheing benefits are achieved by the "tag field" being non-zero (e.g. shown as y > 0 bits). Note that we can potentially store the tag in a separate faster-memory than the cache, and this is often done (see this), but i am for now showing it part of the cache, as the tag is an important part/storage of the caching process.

Search:
The search process for caches is as follows:
  1. Address --> find the cache line, or group of lines (if s > 0) associated with this address
  2. Search "associatively" within this set's tags and look for a match.
If number of sets is 1 i.e. s is 0 bits, we say that the cache is "direct mapped"; and when there is only 1 cache line i.e. we view upon all lines as part of 1 set, then we call it "fully associative" cache. One will also find in the literature the definition that a direct-mapped cache is a set-associative cache with 1 line per set [this]. The terminology only makes sense because of the way "Associative" searches are done for the elements part of a "set" i.e. this is an advancement over the simplistic data structure ive shown i.e. one may do some sort of fast-searches for the tag-section in a set by keeping the tags in a special memory.

Question: What difference do these things (type 1 and type2) make ?
The answer is that in type1, "close" addresses map to the same line (the higher order bits  in the address, decide the cache-line); and in type 2, "close" addresses map to different lines (as, relatively lower order bits in the address, decide the cache line)

As a result of this, type 2 is preferred, because cacheing wants to rely on principle of locality i.e. we want close addresses to map to different cache lines so that all those lines are in the cache when the processor needs them, effectively saving latency and increasing throughput.

Cache Eviction Logic:
In the above discussion the eviction logic was that, in case of "direct mapped cache" i.e. each line is a separate set, if the cache line one is interested in, is occupied with a different tag, we kick out that data and load new data. 
If we are talking about a set-associative cache, then, we would potentially associatively search for a the tag match in all the lines in the "set", and then we have to decide some mechanism to kick out an entry - this may be done as LRU, LFU, FIFO etc (google for this). 

Cache Coherency:

Distributed Caches & Partitioning: See this

Extensions:
Web Cacheing:
One can in principle extend this logic to cacheing in general, e.g. suppose we want to cache the content from www. google.com/blah1 and www.google.com/blah2. May it be nice to use higher-order bits (e.g. the www.google.com to map to a cache-block, or may be we better off using finer granularities (i.e. blah1, blah2) to store data. I would go with the latter for locality of reference. One can see that there is potential for research here too i.e. what is the performance benefits of each style depending on the browsing habits of the user. 

Network Buffering:
Often, in networking systems, one may consider the caches to be made up of a pool of buffers separated in zones -- and thus in the above classification scheme, networking caches may be envisioned as; "zone" meaning a set, and buffers in a zone meaning the cache-lines in a set. The eviction scheme is used to figure out which buffer to re-use in a particular zone

No comments:

Post a Comment