Overview
Caching involves storing data in temporary memory for faster retrieval later on. Examples include a database that uses caching to avoid reading from the hard drive and a web browser that caches images to avoid downloading them again. There are two types of caching we will be talking about today, least frequently used (LFU) and least recently used (LRU) caching.
Least Frequently Used Caching
LFU caching involves tracking the number of times a piece of memory is accessed. When the cache reaches it’s size limit, the piece of memory with the lowest number of accesses is deleted to free up space. An example of using LFU caching is suggested words in mobile keyboards. Users are more likely to use the same words often so they have a high frequency access and stay in the cache. LFU can be effective but it’s efficiency can be lost when memory is repeatedly accessed for a short time and not accessed again. The access frequency for that memory is high which forces the cache to keep the piece of memory even if it wont be accessed again. Newer items are prone to be deleted early as then have a relatively low access frequency.
A doubly linked list is used to implement LFU. This allows items to be removed in O(1)
time. A frequencyCount
property is used to track how frequently a node in the list has been accessed.
function LfuNode(key, value) {
this.prev = null;
this.next = null;
this.key = key;
this.data = value;
this.frequencyCount = 1;
}
function LfuDoublyLinkedList() {
this.head = new LfuNode('head', null);
this.tail = new LfuNode('tail', null);
this.head.next = this.tail;
this.tail.prev = this.head;
this.size = 0;
}
Standard insertion and removal for a doubly linked list is needed but only at the head:
LfuDoublyLinkedList.prototype.insertAtHead = function(node) {
// set current head as new node's next
node.next = this.head.next;
this.head.next.prev = node;
// set current head as new node
this.head.next = node;
node.prev = this.head;
this.size++;
}
LfuDoublyLinkedList.prototype.removeAtTail = function() {
const oldTail = this.tail.prev; // save last node to return back
// get reference to node we want to remove
const prev = this.tail.prev;
// from the node we want to remove - get the prev node, then set THAT node's next to tail
prev.prev.next = this.tail;
// set prev node of the tail to the node previous to the node we want to remove
this.tail.prev = prev.prev;
this.size--;
return oldTail;
}
LfuDoublyLinkedList.prototype.removeNode = function(node) {
node.prev.next = node.next;
node.next.prev = node.prev;
this.size--;
}
The LFU cache contains two hash tables: keys
and frequency
. keys
is used to store each instance of a doubly linked list. Each doubly linked list instance holds all nodes with the same frequency. The frequency
hash table is used to keep track of overall access frequency of all nodes in the list using a frequency:key
key/value pair.
function LfuCache(capacity) {
this.keys = {}; // stores LfuNode
this.frequency = {}; // stores LfuDoublyLinkedList
this.capacity = capacity;
this.minFrequency = 0; // keeps track of the lowest frequency linked list
this.size = 0;
}
There are two cases when implementing the LFU cache. First, we can insert and new item and replace an old item. A new node is created and if the cache is not, the new node is inserted into the linked list instance of frequency 1. If the cache is full then the tail item of the linked list is deleted before the new node is inserted. The second case is if the node already exists and should be replaced. This means we must move the node to the head of corresponding linked list and increment minFrequency
accordingly.
LfuCache.prototype.set = function(key, value) {
let node = this.keys[key];
// if node doesnt exist in keys then add it
if (node == undefined) {
// create new node and store in keys
node = new LfuNode(key, value);
this.keys[key] = node;
// if we have space for node then try to add it to linked list with frequency 1
if (this.size !== this.capacity) {
// if linked list for frequency 1 doesnt exist then create it
if (this.frequency[1] == undefined)
this.frequency[1] = new LfuDoublyLinkedList();
// add new node and increment size of frequency 1
this.frequency[1].insertAtHead(node);
this.size++;
} else {
// else frequency 1 is full and we need to delete a node first so delete tail
const oldTail = this.frequency[this.minFrequency].removeAtTail();
delete this.keys[oldTail.key];
// if we deleted frequency 1 then add it back before adding new node
if (this.frequency[1] === undefined)
this.frequency[1] = new LfuDoublyLinkedList();
this.frequency[1].insertAtHead(node);
}
// we added a new node so minFrequency needs to be reset to 1
// aka new node was referenced once
this.minFrequency = 1;
} else {
// else node exists so we need to get it and move it to the new linked list
// save the old frequency of the node and increment (also update data)
const oldFrequencyCount = node.frequencyCount;
node.data = value;
node.freqCount++;
// remove node from the linked list
this.frequency[oldFreqCount].removeNode(node);
// if new list doesnt exist then make it now
if (this.frequency[node.frequencyCount] === undefined)
this.frequency[node.frequencyCount] = new LfuDoublyLinkedList();
// now add node to new linked list with the incremented freqCount
this.frequency[node.frequencyCount].insertAtHead(node);
// if the node we incremented was in the minFrequency list of all lists
// and there's nothing left in the old list then we know the new minFrequency
// for any node in any list is in the next freq so increment that now
if (
oldFrequencyCount === this.minFrequeny
&& Object.keys(this.frequency[oldFequencyCount]).size === 0
) {
this.minFrequency++;
}
}
}
To get from the cache, we need to return a existing node in O(1)
time and increment the counter for accessing. If the element doesn’t exist in the cache then return null. If the node does exist, the frequency is incremented, the node is brought to the head of the linked list, and the minFrequency
variable is adjusted if needed.
LfuCache.prototype.get = function(key) {
const node = this.keys[key];
if (node == undefined) return null;
const oldFrequencyCount = node.frequencyCount;
node.frequencyCount++;
// remove node from old frequency list and create new one if next one doesnt exist
// before adding the node to the next list at the head
this.frequency[oldFrequencyCount].removeNode(node);
if (this.frequency[node.frequencyCount] === undefined)
this.frequency[node.frequencyCount] = new LfuDoublyLinkedList();
this.frequency[node.frequencyCount].insertAtHead(node);
// if old frequency list is empty then update minFrequency
if (
oldFreqCount === this.minFrequency
&& Object.keys(this.frequency[oldFrequencyCount]).length === 0
) {
this.minFrequency++;
}
return node.data;
}
We can now test out the LFU caching:
const myLFU = new LfuCache(5);
myLFU.set(1, 1); // {1: 1}
myLFU.set(2, 2); // {1: 2<->1}
myLFU.set(3, 3); // {1: 3<->2<->1}
myLFU.set(4, 4); // {1: 4<->3<->2<->1}
myLFU.set(5, 5); // {1: 5<->4<->3<->2<->1}
myLFU.get(1); // return 1, {1: 5<->4<->3<->2, 2: 1}
myLFU.get(1); // return 1, {1: 5<->4<->3<->2, 3: 1}
myLFU.get(1); // return 1, {1: 5<->4<->3<->2, 4: 1}
myLFU.set(6, 6); // {1: 6<->5<->4<->3, 4: 1}
myLFU.get(6); // return 6 {1: 5<->4<->3, 4: 1, 2: 6}
Least Recently Used Caching
LRU caching works by removing the oldest item first when adding a new item. When a list is full, the oldest item which sits at the front of the linked list is popped off before adding a new node to the tail. This requires knowing which node was used when using a hash table. The doubly linked list is used here as well.
function DllNode(key, data) {
this.key = key;
this.data = data;
this.next = null;
this.prev = null;
}
function LruCache(capacity) {
this.keys = {};
this.capacity = capacity;
this.head = new DllNode('', null);
this.tail = new DllNode('', null);
this.head.next = this.tail;
this.tail.prev = this.head;
}
New node are only added to tail so we only need remove node and add at tail implementations.
LruCache.prototype.removeNode = function (node) {
const prev = node.prev;
const next = node.next;
prev.next = next;
next.prev = prev;
}
LruCache.prototype.addNode = function (node) {
const realTail = this.tail.prev;
realTail.next = node;
this.tail.prev = node;
node.prev = realTail;
node.next = this.tail;
}
When a node is retrieved or a new node is added, that node is brought to the head of the linked list as the most recently used node. This is true of removing a node as well.
LruCache.prototype.get = function(key){
const node = this.keys[key];
if(node == undefined) return null;
this.removeNode(node);
this.addNode(node);
return node.data;
}
When setting a node, the keys
property is used to keep retrieval at O(1)
time when getting that node. If the cache is full when adding a new node, the node furthest from the tail is removed.
LruCache.prototype.set = function (key, value){
// remove node from 'old' position
const node = this.keys[key];
if(node) this.removeNode(node);
// create new node and add at tail
const newNode = new DllNode(key, value);
this.addNode(newNode);
this.keys[key] = newNode;
// if we are over capacity then remove oldest node - its at the head
if(Object.keys(this.keys).length > this.capacity){
const realHead = this.head.next;
this.removeNode(realHead);
delete this.keys[realHead.key];
}
}
We can now implement our LRU caching.
const myLRU = new LruCache(5);
myLRU.set(1, 1); // 1
myLRU.set(2, 2); // 1 <-> 2
myLRU.set(3, 3); // 1 <-> 2 <-> 3
myLRU.set(4, 4); // 1 <-> 2 <-> 3 <-> 4
myLRU.set(5, 5); // 1 <-> 2 <-> 3 <-> 4 <-> 5
myLRU.get(1); // 2 <-> 3 <-> 4 <-> 5 <-> 1
myLRU.get(2); // 3 <-> 4 <-> 5 <-> 1 <-> 2
myLRU.set(6, 6); // 4 <-> 5 <-> 1 <-> 2 <-> 6
myLRU.set(7, 7); // 5 <-> 1 <-> 2 <-> 6 <-> 7
myLRU.set(8, 8); // 1 <-> 2 <-> 6 <-> 7 <-> 8
Summary
In the real world, a mix of these two implementations is used for caching. When comparing LFU to LRU, LFU can be slower as it doesn’t account for our temporal problem mentioned before. LFU works on the idea that the most used items should be cached but fails when an item is used many times in succession then not used again. The item can have a high frequency count which keeps it in cache even when it hasn’t been used in a while. LRU on the other hand, uses the idea that the most recent items used should be cached. This can provide much better performance as items that are used many times but not used for a while may fall off. The trade off is when many items are used once. These items are added to the cache even though their frequency is one, other items that haven’t been used as recently but may be used soon are removed. If you you’d like to read about this topic more in depth, see here for an overview of how caching works on the CPU level.