http://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
Monthly Archives: July 2014
大数据
finding-median-of-large-set-of-numbers-too-big-to-fit-into-memory
http://stackoverflow.com/questions/3888036/finding-median-of-large-set-of-numbers-too-big-to-fit-into-memory
http://www.fusu.us/2013/07/median-in-large-set-across-1000-servers.html
Java多线程
http://stackoverflow.com/questions/10684111/can-notify-wake-up-the-same-thread-multiple-times
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Condition.html
Difference between Binary Semaphore and Mutex
http://stackoverflow.com/questions/62814/difference-between-binary-semaphore-and-mutex
Java并发之读写锁Lock和条件阻塞Condition的应用(转载)
http://www.cnblogs.com/yaowukonga/archive/2012/08/27/2658329.html
H2O与Java线程同步
http://www.cnblogs.com/lautsie/p/3430356.html
http://ttianzhao.blogspot.com/2014/07/concurrent-h2o.html
A Java example of bounded buffer
http://www.cnblogs.com/yaowukonga/archive/2012/08/27/2658329.html
public class BoundedBuffer{
final Lock lock = new ReentrantLock();
final Condition notFull = lock.newCondition();
final Condition notEmpty = lock.newCondition();
final Object [] items = new Object[100];
int putptr, takeptr, count;
public void put(Object x) throws InterruptedException{
lock.lock();
try{
while(count == items.length) notFull.await();
items[putptr] = x ;
if( ++putptr == items.length) putptr= 0 ;
++count;
notEmpty.signal();
} finally{
lock.unlock();
}
}
public Object take() throws InterruptedException{
lock.lock();
try{
while ( count == 0 ) notEmpty.await();
Object x = items[takeptr];
if( ++takeptr == items.length) takeptr= 0;
— count ;
notFull.signal();
return x ;
}finally{
lock.unlock();
}
}
}
http://tutorials.jenkov.com/java-concurrency/blocking-queues.html
public class BlockingQueue{
private List queue = new LinkedList();
private int limit = 10 ;
public BlockingQueue(int limit){
this.limit = limit ;
}
public synchronized void enqueue(Object item) throws InterruptedException{
while(this.queue.size() == this.limit){
wait();
}
if(this.queue.size() == 0){
notifyAll();
}
this.queue.add(item) ;
}
public synchronized void deque() throws InterruptedException {
while(this.queue.size() == 0 ) {
wait;
}
if( this.queue.size() == this.limit) {
notifyAll();
}
return this.queue.remove(0);
}
}
Trie
google linkedin facebook面经集锦
Google:
http://www.yiyome.com/article/view/1003229-google-.html
Given API:
int Read4096(char* buf);
It reads data from a file and records the position so that the next time when it is called it read the next 4k chars (or the rest of the file, whichever is smaller) from the file.
The return is the number of chars read.
Todo: Use above API to Implement API
“int Read(char* buf, int n)” which reads any number of chars from the file.
http://www.careercup.com/question?id=14424684
*****************************************************************
Linkedin:
http://www.amoduo.com/article/view/1003269-linkedin-.html
http://www.mitbbs.com/article_t/JobHunting/32331973.html
http://www.weiming.info/zhuti/JobHunting/31909473/
http://blog.sina.com.cn/s/blog_696e177d0101c4vv.html
**************************************************************
Linked List集锦
flatten a binary tree to linked list leetcode
flatten and unflatten a layered linked list to doubly linked list
http://judebert.com/progress/archives/448-Interview-Day-6-Unflattening-a-Bush.html
vector in c++
Web Cache and HTTP
Caching Tutorial for Web Authors and Webmasters
https://www.mnot.net/cache_docs/#SCRIPT
HTTP Made Really Easy
A Practical Guide to Writing Clients and Servers
http://www.jmarshall.com/easy/http/
http://stackoverflow.com/questions/2092527/what-happens-when-you-type-in-a-url-in-browser
In an extremely rough and simplified sketch, assuming the simplest possible HTTP request, no proxies and IPv4 (this would work similarly for IPv6-only client, but I have yet to see such workstation):
- browser checks cache; if requested object is in cache and is fresh, skip to #9
- browser asks OS for server’s IP address
- OS makes a DNS lookup and replies the IP address to the browser
- browser opens a TCP connection to server (this step is much more complex with HTTPS)
- browser sends the HTTP request through TCP connection
- browser receives HTTP response and may close the TCP connection, or reuse it for another request
- browser checks if the response is a redirect (3xx result status codes), authorization request (401), error (4xx and 5xx), etc.; these are handled differently from normal responses (2xx)
- if cacheable, response is stored in cache
- browser decodes response (e.g. if it’s gzipped)
- browser determines what to do with response (e.g. is it a HTML page, is it an image, is it a sound clip?)
- browser renders response, or offers a download dialog for unrecognized types
Again, discussion of each of these points have filled countless pages; take this as a starting point. Also, there are many other things happening in parallel to this (processing typed-in address, adding page to browser history, displaying progress to user, notifying plugins and extensions, rendering the page while it’s downloading, pipelining, connection tracking for keep-alive, etc.).
TCP 3-Way Handshake
http://www.inetdaemon.com/tutorials/internet/tcp/3-way_handshake.shtml
Flacky Test
http://martinfowler.com/articles/nonDeterminism.html
http://martinfowler.com/bliki/SelfInitializingFake.html
http://xunitpatterns.com/Humble%20Object.html
http://googletesting.blogspot.com/2008/04/tott-avoiding-flakey-tests.html
Eradicating Non-Determinism in Tests
An automated regression suite can play a vital role on a software project, valuable both for reducing defects in production and essential for evolutionary design. In talking with development teams I’ve often heard about the problem of non-deterministic tests – tests that sometimes pass and sometimes fail. Left uncontrolled, non-deterministic tests can completely destroy the value of an automated regression suite. In this article I outline how to deal with non-deterministic tests. Initially quarantine helps to reduce their damage to other tests, but you still have to fix them soon. Therefore I discuss treatments for the common causes for non-determinism: lack of isolation, asynchronous behavior, remote services, time, and resource leaks.
B-Tree and B+Tree
References:
http://stackoverflow.com/questions/870218/b-trees-b-trees-difference
http://blog.csdn.net/hguisu/article/details/7786014
http://stackoverflow.com/questions/15485220/advantage-of-b-trees-over-bsts
B-tree and B+tree are the most commonly used data structure for building index in database/file system. The reason is it has a higher branching factors over balanced BST(like red black tree) thus it has a lower height. The reason is nodes need to be accessed for BST and B-Tree(B+tree) are both logm(n) where m is the number of branches on each node. lower height means fewer access to the disk to retrieve the nodes, which is what matters for database performance. Database usually stores huge amounts of data and the index can be too large to fit into memory as well and instead stored on the disk. Access to the disk is much slower than memory access. Although higher branching factor means more time in finding the key in the node itself performed in memory, it’s almost ignorable comparing to the time in disk I/O.
Usuallly the database created a node with a size exactly equal to the size of one page so it can be read using one read I/O.
B+TREE’s advantages over B tree:
1. since its internal nodes don’t store data but only store keys, it can have more keys than B-tree within same size of space (one page normally). It means higher branching factor and lower height.
2. Faster full-scan of data using the linked list of all data items at the bottom: fewer cache misses comparing a full-tree traversal of B-tree.
Advantages of B-tree over B+tree: data is closer to the root. thus if some node is much more frequently accessed and we can gain some performance if it’s closer to the root.