Month: June 2014


– JOIN: nested join, hash join, sort-merge join
– Number: Fibonacci, prime,随机取文件某一行
– String: strstr, wordcount
– Tree: height, lca, balance tree
– Heap: 查找最大的k个数
– DP: 最大连续子串和
– array: find a key in rotated array, 去除重复字符
– linkedlist: 是否有环,插入结点,删除重复结点 
– 递归回溯:变化很多,这方面需要大量练习

java GC
C++ virtual, smart pointer
数据库:知道btree, 索引
search engine: 倒排表,拉链,稀疏索引,空间向量模型,tf*idf, 
large scale data: hash, consistent hash, bloom filter, bitmap, 外排序,
分布式:CAP理论,gossip,Paxos, GFS设计思想
network: socket, tcp3次握手, asyschnoized io, epoll, select, 惊群

trie tree



其实就是post-order traversal


1 Given a binary tree, find the lowest common ancestor of two given nodes in the tree.






Naive Approach:
A naive approach is to reuse the solution from Determine if a Binary Tree is a Binary Search Tree. Starting from the root, we process in the order of current node, left child, then right child. For each node, you would call isBST() to check if the current subtree is a BST. If it is, then we have found the largest BST subtree. If it is not, then we have to continue examining its left and right child. If only one of the subtrees is BST, then we can return that subtree. However, if both left and right subtrees are BSTs, then we have to compare which subtree is larger (has more descendant nodes), then return the larger one.

Assume that we have a complete tree (ie, all leaves are at the same depth) with n nodes, the naive approach’s run time complexity is O(n lg n). The proof is left as an exercise to the reade

A Bottom-up Approach:
The naive approach is using a top-down approach. It is hardly efficient, simply because we are calling isBST() over and over again. Each time isBST() is called, it traverses down to the leaves to verify if the subtree is a BST.

Counting sort, bucket sort and radix sort

When comparing counting sort, bucket sort and radix sort, it’s important to know it’s about trade off between memory, time, and simplicity of data structure. Counting sort, bucket sort and radix sort can essentially all be considered as bucket sorting.

Radix sort: n log_r(k). Essentially radix sort performs bucket sort log_r(K) times. It can deal with larger key range (than bucket sort).    r is the base here. the larger is r the better time complexity we got.  but it also means more memory. remember the bucket   array in radix sort is of length r (smaller than n or maximum value of K)  If we use n as the base, it becomes a normal bucket sort. why?   cuz now we have only one ‘digit’, so we only do one bucket sort/counting sort.

Bucket sort: is a generalization of counting sort. The number of bucket sort used  is n. and range can be larger than n. thus we saved some space than in counting sort. but it loses the worse case of O(n+K).  Also the data has to be uniformly distributed.

Counting sort: O(n+k) time and O(K) space