Month: April 2014

递归: 确定子问题

递归,在于准确定义每一层应该解决的问题,并且相信下一层的问题会被递归解决 (正如本层正确的解决了它应该解决的问题,下一层也会一样解决它应该解决的问题)。

定义混淆,就容易在本层递归中试图去解决上一层或者下一层的问题,就会带来不必要的麻烦或者导致错误的算法。

 

Example 1: PathSum in a binary tree (start and end point can be anywhere in the tree)

一开始我写了一个错误的版本.和正确的版本只有一步之遥。原因就是我在当前层调用,试图去思考上一层调用应该解决的问题。从错误代码中可以看到,我试图去考虑这个pathFromParent,试图决定当前root是应该和leftsubtree&rightsubtree连在一起组成更大的path, 还是应该加入到包含parent的path。后来发现根本没有办法传入这个pathFromParent. 

正确的思路应该是,当前问题只计算root+leftsubtree+rightsubtree的长度,同时返回一个max( root+left , root+right)给parent,交给parent去决定它是否要把它的path延伸到当前root。

//wrong answer. but almost get it!!! look at the correct answer after this one:
int maxPathSum(TreeNode * root) {
if (root == NULL) return 0;
int maxim = INT_MIN;
calculatePathSum(root, maxim, 0);
return maxim;
}

int calculatePathSum(TreeNode * root, int & maxim, int pathFromParent){
if (root == NULL) return pathFromParent;

int leftPath = maxPathSumHelper(root->left, maxim);
int rightPath = maxPathSumHelper(root->right, maxim);
int tempMaxim;
if (leftPath < 0)   leftPath = 0;
if (rightPath < 0) rightPath = 0;
if (pathFromParent < 0) pathFromParent = 0;
if (pathFromParent >= leftPath && rightPath >= leftPath){
tempMaxim = pathFromParent + rightPath + root->val;
}
if (pathFromParent >= rightPath && leftPath >= rightPath)
tempMaxim = pathFromParent + leftPath + root->val;

if (pathFromParent <= rightPath && pathFromParent <= leftPath)
tempMaxim = rightPath + leftPath + root->val;
if (tempMaxim > maxim)
maxim = tempMaxim;

return tempMaxim;
}

int maxPathSumHelper(TreeNode * root, int & maxim){
if (root == NULL) return 0;
return calculatePathSum(root, maxim, 0);
}

 

class Solution{
//correct answer:
public:
int maxPathSum(TreeNode * root){
max_sum = INT_MIN;
dfs(root);
return max_sum;
}
private:
int max_sum;
int dfs(const TreeNode* root){
if (root == nullptr) return 0;
int left = dfs(root->left);
int right = dfs(root->right);
int sum = root->val;
if (left > 0) sum += left;
if (right > 0) sum += right;
max_sum = max(max_sum, sum);
return max(left, right) > 0 ? max(left, right) + root->val : root->val;
}
};

Example 2:

In order traversal of tree

第一个解法非常简洁,思路清晰。就在于它恰当的选择了当前子问题。

Version I:   vector<int> inOrderTraversal(TreeNode * root){
vector<int> result;
stack< const TreeNode *> traversal;
const TreeNode * p = root;
while (traversal.size() || p != nullptr){
if (p != nullptr){
traversal.push(p);
p = p->left;
}
else{
p = traversal.top();
result.push_back(p->val);
traversal.pop();
p = p->right;
}
return result;
}

 

Look at the code below. find its bug:

vector<int> inorderTraversal(TreeNode *root) {
vector<int> path;
stack<TreeNode * > traversal;
if (root)
traversal.push(root);
while (traversal.size()){
TreeNode * p = traversal.top();
if (p->left)
traversal.push(p->left);
else{
path.push_back(p->val);
traversal.pop();
if (p->right)
traversal.push(p->right);
}
}
return path;
}

Code below adds an extra bool variable to indicate it has returned from the left subtree. this solves the bug above. the reason  we need this bool variable is because the code  make a decision over the boundary  whether to traverse left subtree and right subtree in current level.

Version II:

vector<int> inorderTraversal2(TreeNode *root) {
vector<int> path;
stack<TreeNode * > traversal;
bool leftEnd = false;
if (root)
traversal.push(root);
while (traversal.size()){
TreeNode * p = traversal.top();
if (p->left && !leftEnd)
traversal.push(p->left);
else{
path.push_back(p->val);
traversal.pop();
leftEnd = true;
if (p->right){
traversal.push(p->right);
leftEnd = false;
}
}
}
return path;
}

Top k frequent item in stream data

I saw this on someone’s interview question. When I start looking into it, i found it becoming more and more challenging and interesting.

After reading several papers on this topic, I have an better idea of how we may solve it and what issues we need to consider.

There are two categories of approach in tackling this problem. One is counter-based, one is Sketch-based.

Now i’m going to try explaining some of the algorithms that I liked and also proven to be efficient according to the paper.

algorithm 1: Frequent algorithm (paper address http://erikdemaine.org/papers/NetworkStats_ESA2002/paper.pdf).

This algorithm extends the classic majority algorithm. the classic majority algorithm finds the element which occurs > n/2 in an array of size n by using one counter, monitoring only one element at a time. And each occurrence of the same element will increase the counter by one, while an occurrence of different element will cancel out and decrease the counter by one. When the counter becomes zero, we assign it to the new element being compared. This way, if there exists a majority element, it will be the one being monitored at the end. we still need a second pass to verify if it’s truly a majority element.

The frequent algorithm here use m counters monitoring m element and  return m candidates which will include all elements that occur more than n/m times. n is the size of data.  The basic idea is when one new element come in, we assign a counter to it if there is any left; otherwise, we check if it exists among any of the m elements being monitored . increment the associated counter by one if it already exists, and decrease every counter by one if it does not exist. If one counter reaches 0, we assign it to the new element.

To be continued…