Cache (computing)

Caching is a term used in computer science. The idea behind a cache (pronounced "cash" /ˈkæʃ/ KASH [1][2][3]) is very simple: Very often, obtaining a result for a calculation is very time-consuming, so storing the result is generally a good idea. Two kinds of storage media are used: One is usually quite big, but accessing it is "slow"; the other one can be accessed much faster, but generally it is small. The very basic idea behind caching is to use the medium that is fast to access to have copies of data. There is no difference between the copy, and the original. Accessing the original data may take a long time, or it may be expensive to do (for example: the results of a difficult problem that take a long time to solve). For this reason, it is much "cheaper" to simply use the copy of the data from the cache. Put differently, a cache is a temporary storage area that has copies of data that is used often. When a copy of the data is in this cache, it is faster to use this copy rather than re-fetching or re-calculating the original data. This will make the average time needed to access the data shorter. Putting a new value into a cache often means that an older value needs to be replaced. There are different ideas (usually called "strategies") on how to select the value to replace.

A buffer is very similar to a cache. It is different in that the client accessing the data in a buffer knows there is a buffer; the buffer is managed by the application. With a cache, the client accessing the data need not be aware there is a cache.

Typical computer applications access data in very similar ways. Suppose the data is structured into "blocks", which can be accessed individually. When an application accesses a block it is also very likely to access (or reference) a block that is "close" to the original block. This is known as locality of reference. There are different kinds of such "locality". Locality of reference is one of the reasons why caches work well in many areas of computing.

In order to work well, caches are small, compared to the whole amount of data. The bigger the cache, the longer it takes to lookup an entry. Bigger caches are also more expensive to build.

How caches work

A cache is a block of memory for storing data which is likely used again. The CPU and hard drive often use a cache, as do web browsers and web servers.

A cache is made up of many entries, called a pool. Each entry holds a datum (a bit of data) which is a copy of a datum in another place. Caches usually use what is called a backing store. Backing stores are slow or expensive to access, compared to the cache. A disk cache uses a hard disk as a backing store, for example. Each entry also has a little information attached, called a tag. This tag is used to find the location where the original data is stored.

Caches for reading

A client (a CPU, web browser, operating system) wants to access a bit of data, it believes to be in the backing store, it first checks to see if the datum can be found in the cache. If the data can be found in the cache, the client can use it and does not need to use the main memory. This is known as a cache hit.[4] So, for example, a web browser program might check its local cache on disk to see if it has a local copy of the contents of a web page at a particular URL. In this example, the URL is the tag, and the contents of the web page is the datum.

The other situation that can occur is that the datum with the tag cannot be found in the cache. This is known as cache miss. The datum needs to be fetched from the backing store. Usually, it is copied into the cache, so that the next time, it no longer needs to be fetched from the backing store.

The cache only has a limited size. To make room for the previously uncached entry, another cached entry may need to be deleted from the cache. Special rules are used to find the entry that should best be deleted. These rules are usually called Heuristics. Heuristics used to find the entry are called replacement policy. A very simple rule used is called Least recently used (or LRU). It simply takes the entry that was used the longest time ago. Other heuristics are listed at cache algorithm..

Caches for writing

Caches can also be used for writing data; the benefit of this is that the client can continue its operation once the entry has been written to the cache; it does not have to wait until the entry is written to the backing store.

However, the entry must be written to the backing store at some point in time. The timing when this happens is controlled by the write policy.

In a write-through cache, each entry is written to the backing store immediately, as well as being stored in cache.

The other option is to only write to cache, and write to the backing store later. This is known as write-back (or write-behind) cache. The cache marks the entries that have not yet been written to the backing store; the mark that is used is often referred to as dirty flag. Before the entries are deleted from the cache, they are written to the backing store. This is known as lazy write. A miss in a write-back cache (which requires a block to be replaced by another) will often need two memory accesses: one to get the needed datum, and another to write replaced data from the cache to the store.

The caching policy may also say that a certain datum must be written to cache. The client may have made many changes to the datum in the cache. After it is done, it may explicitly tell the cache to write back the datum.

No-write allocation is a cache policy where only reads are cached. This avoids the need for write-back or write-through caching. Writes are done to the backing store all the time.

The client is not the application that changes data in the backing store. If the data changed in the backing store, the copy in the cache will be out of date, or stale. Alternatively, when the client updates the data in the cache, copies of that data in other caches will become stale. There are special communication protocols that allow cache managers to talk to each other to keep the data meaningful. These are known as coherency protocols.

Selecting the entry to replace

A cache is small, and it will be full, or almost full, most of the time. So when a new value is added, an old one needs to be removed. There are different ways in which this selection can be done:

  • First in First out: Simply replace the entry that was added to the cache the longest time ago
  • Least recently used: This idea is similar to the FIFO above, but when an entry is used, its timestamp/age is updated.
  • Least frequently used: Again, similar to the FIFO case, instead of using a timestamp use a counter, which is incremented each time an entry is used
  • Pick an entry at random

History

The word cache was first used in computing in 1967, when a scientific article was prepared to be published in IBM Systems Journal. The article was about a new improvement of the memory in Model 85. Model 85 was a computer of the IBM System/360 product line. The editor of the Journal wanted a better word for high-speed buffer, used in the article. He got no input, and suggested cache, from the French cacher, meaning "to hide". The article was published in early 1968, and the authors were honored by IBM. Their work was widely welcomed and improved. Cache soon became standard usage in computer literature.[5]

Where caches are used

CPU caches

Small memories on or close to the CPU chip can be made faster than the much larger main memory. Most CPUs since the 1980s have used one or more caches. Modern general-purpose CPUs inside personal computers may have as many as half a dozen. Each cache may be specialised to a different part of the task of executing programs.

Disk caches

CPU caches are generally managed entirely by hardware, other caches are managed by a different kinds of software. The operating system usually manages a page cache in main memory. Users outside computer science usually call this cache virtual memory. It is managed by the kernel of the operating system.

Modern hard drives have disk buffers. These are sometimes called "disk cache", but this is wrong. The main function of these buffers is to order disk writes, and to manage reads. Repeated cache hits are rare, because the buffer is very small compared to the size of the hard drive.

Local hard disks are fast compared to other storage devices, such as remote servers, local tape drives, or optical jukeboxes. Using local hard disks as caches is the main concept of hierarchical storage management.

Web caches

Web browsers and web proxy servers use caches to store previous responses from web servers, such as web pages. Web caches reduce the amount of information that needs to be transmitted over the network. Information previously stored in the cache can often be re-used. This reduces bandwidth and processing requirements of the web server, and helps to improve responsiveness for users of the web.

Modern web browsers use a built-in web cache, but some internet service providers or organizations also use a caching proxy server. This is a web cache that is shared between all users of that network.

Search engines also often make web pages they have indexed available from their cache. For example, Google provides a "Cached" link next to each search result. This is useful when web pages are temporarily inaccessible from a web server.

Caching with unreliable networks

Write-through operation is common in unreliable networks (like an Ethernet LAN). The protocol used to make sure the data in the write cache makes sense when several write caches are used is very complex, in such a case.

For instance, web page caches and client-side network file system caches (like those in NFS or SMB) are typically read-only or write-through to keep the network protocol simple and reliable.

The difference between buffer and cache

Buffer and cache are not mutually exclusive; they are also often used together. The reason why they are used is different, though. A buffer is a location in memory that is traditionally used because CPU instructions cannot directly address data stored in peripheral devices. Computer memory is used as an intermediate store.

Additionally such a buffer may be feasible when a large block of data is assembled or disassembled (as required by a storage device), or when data may be delivered in a different order than that in which it is produced. Also a whole buffer of data is usually transferred sequentially (for example to hard disk), so buffering itself sometimes increases transfer performance. These benefits are present even if the buffered data are written to the buffer once and read from the buffer once.

A cache also increases transfer performance. A part of the increase similarly comes from the possibility that multiple small transfers will combine into one large block. But the main performance gain occurs because there is a good chance that the same datum will be read from cache several times, or that written data will soon be read. The only purpose of Caches is to reduce accesses to the underlying slower storage. Cache is also usually an abstraction layer that is designed to be invisible from the perspective of neighboring layers. That way, the applications or clients may not be aware that there is a cache.

References

  1. "cache noun". Oxford Learner's Dictionaries. Oxford University Press.
  2. "cache definition, meaning". Cambridge Dictionaries Online. Cambridge University Press.
  3. "Cache". Merriam-Webster Online Dictionary. Merriam-Webster, Incorporated.
  4. Bach, Martin; Shaw, Steve (September 2010). Pro Oracle Database 11g RAC on Linux (2nd ed.). Apress. p. 257. ISBN 978-1430229582. Retrieved 10 March 2011.
  5. G. C. Stierhoff and A. G. Davis. A History of the IBM Systems Journal. IEEE Annals of the History of Computing, Vol. 20, No. 1 (Jan. 1998), pages 29-35. [1]