Category: Algorithms

Everything about Cache

I’ll put every thing I read about cache in this article

Write-through, write-around and write-back cache

There are three main caching techniques that can be deployed, each with their own pros and cons.

  • Write-through cache directs write I/O onto cache and through to underlying permanent storage before confirming I/O completion to the host. This ensures data updates are safely stored on, for example, a shared storage array, but has the disadvantage that I/O still experiences latency based on writing to that storage. Write-through cache is good for applications that write and then re-read data frequently as data is stored in cache and results in low read latency.
  • Write-around cache is a similar technique to write-through cache, but write I/O is written directly to permanent storage, bypassing the cache. This can reduce the cache being flooded with write I/O that will not subsequently be re-read, but has the disadvantage is that a read request for recently written data will create a “cache miss” and have to be read from slower bulk storage and experience higher latency.
  • Write-back cache is where write I/O is directed to cache and completion is immediately confirmed to the host. This results in low latency and high throughput for write-intensive applications, but there is data availability exposure risk because the only copy of the written data is in cache. As we will discuss later, suppliers have added resiliency with products that duplicate writes. Users need to consider whether write-back cache solutions offer enough protection as data is exposed until it is staged to external storage. Write-back cache is the best performing solution for mixed workloads as both read and write I/O have similar response time levels.

LRU implementation:

http://openmymind.net/High-Concurrency-LRU-Caching/

1.Hashmap:

A read-write mutex for the hashtable is efficient. Assuming that we are GETing more than we are SETting, we’ll mostly be acquiring read-locks (which multiple threads can secure). The only time we need a write lock is when setting an item. If we keep things basic, it ends up looking like:

If necessary, we could always shard our hashtable to support more write throughput.

http://openmymind.net/Shard-Your-Hash-table-to-reduce-write-locks/

2.List  Ultimately, our goal was to reduce lock contention against our list. We achieved this in three ways. First, we use a window to limit the frequency of promotion. Second, we use a buffered channel to process promotions in a separate thread. Finally, we can do promotion and GC within the same thread.

(See this implementation by Ebay engineering blog. similar to the second option above. http://www.ebaytechblog.com/2011/08/30/high-throughput-thread-safe-lru-caching/ )

http://openmymind.net/Back-To-Basics-Hasthables-Part-2/ :    Back To Basics: Hashtables Part 2 (And A Peek Inside Redis).  This article introduce the incremental rehashing in Redis. An interesting idea.   Summary:  The magic of keeping hashtable efficient, despite varying size, is not that different than the magic behind dynamic arrays: double in size and copy. Redis’ incremental approach ensures that rehashing large hasthables doesn’t cause any performance hiccups. The downside is internal complexity (almost every hashtable operation needs to be rehashing-aware) and a longer rehashing process (which takes up extra memory while running).  Read about Redis’ hashtable implementation: https://github.com/antirez/redis/blob/unstable/src/dict.c

 

 

 

 

Recurrence Relations to Remember

Recurrence                Algorithm                Big-Oh Solution

T(n) = T(n/2) + O(1)      Binary Search            O(log n)

T(n) = T(n-1) + O(1)      Sequential Search       O(n)      

T(n) = 2 T(n/2) + O(1)    Tree traversal       O(n)

T(n) = T(n-1) + O(n)     Selection Sort (other n2 sorts)     O(n2)

T(n) = 2 T(n/2) + O(n)   Mergesort (average case Quicksort)   O(n log n)

T(n) = T(n/2) + O(n)    QuickSelect(select the kth smallest)   O(n)

T(n) = 2 T(n/2) + O(logn)   Optimal Sorted Matrix Search     O(n)

Refer to https://en.wikipedia.org/wiki/Master_theorem for solving more new problems.

 

Java多线程

http://stackoverflow.com/questions/10684111/can-notify-wake-up-the-same-thread-multiple-times

http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Condition.html

Difference between Binary Semaphore and Mutex

http://stackoverflow.com/questions/62814/difference-between-binary-semaphore-and-mutex

Java并发之读写锁Lock和条件阻塞Condition的应用(转载)

http://www.cnblogs.com/yaowukonga/archive/2012/08/27/2658329.html

 

H2O与Java线程同步

http://www.cnblogs.com/lautsie/p/3430356.html

http://ttianzhao.blogspot.com/2014/07/concurrent-h2o.html

 

A Java example of bounded buffer

http://www.cnblogs.com/yaowukonga/archive/2012/08/27/2658329.html

public class BoundedBuffer{

final Lock lock = new ReentrantLock();

final Condition notFull = lock.newCondition();

final Condition notEmpty = lock.newCondition();

 

final Object [] items = new Object[100];

int putptr, takeptr, count;

public void put(Object x) throws InterruptedException{

lock.lock();

try{

while(count == items.length)  notFull.await();

items[putptr] = x ;

if( ++putptr == items.length)  putptr=  0 ;

++count;

notEmpty.signal();

} finally{

lock.unlock();

}

}

public Object take() throws InterruptedException{

lock.lock();

try{

while ( count == 0 )   notEmpty.await();

Object x = items[takeptr];

if( ++takeptr == items.length)  takeptr=  0;

— count ;

notFull.signal();

return x ;

}finally{

lock.unlock();

}

}

}

http://tutorials.jenkov.com/java-concurrency/blocking-queues.html

public class BlockingQueue{
private List queue = new LinkedList();
private int limit = 10 ;
public BlockingQueue(int limit){
this.limit = limit ;
}

public synchronized void enqueue(Object item) throws InterruptedException{
while(this.queue.size() == this.limit){
wait();
}
if(this.queue.size() == 0){
notifyAll();
}
this.queue.add(item) ;
}

public synchronized void deque() throws InterruptedException {
while(this.queue.size() == 0 ) {
wait;
}
if( this.queue.size() == this.limit) {
notifyAll();
}
return this.queue.remove(0);

}

}

 

B-Tree and B+Tree

References:
http://stackoverflow.com/questions/870218/b-trees-b-trees-difference
http://blog.csdn.net/hguisu/article/details/7786014
http://stackoverflow.com/questions/15485220/advantage-of-b-trees-over-bsts

B-tree and B+tree are the most commonly used data structure for building index in database/file system. The reason is it has a higher branching factors over balanced BST(like red black tree) thus it has a lower height. The reason is nodes need to be accessed for BST and B-Tree(B+tree) are both logm(n) where m is the number of branches on each node. lower height means fewer access to the disk to retrieve the nodes, which is what matters for database performance. Database usually stores huge amounts of data and the index can be too large to fit into memory as well and instead stored on the disk. Access to the disk is much slower than memory access. Although higher branching factor means more time in finding the key in the node itself performed in memory, it’s almost ignorable comparing to the time in disk I/O.

Usuallly the database created a node with a size exactly equal to the size of one page so it can be read using one read I/O.
B+TREE’s advantages over B tree:

1. since its internal nodes don’t store data but only store keys, it can have more keys than B-tree within same size of space (one page normally). It means higher branching factor and lower height.
2. Faster full-scan of data using the linked list of all data items at the bottom: fewer cache misses comparing a full-tree traversal of B-tree.

Advantages of B-tree over B+tree: data is closer to the root. thus if some node is much more frequently accessed and we can gain some performance if it’s closer to the root.