Month: August 2016

San Francisco City Hall Wedding Guide

How to Get Married at San Francisco City Hall

You can either get the marriage license on the same day or beforehand. you should at least get the marriage license 30 minutes before the ceremony.

you can get the marriage license in other city hall within California.

Santa Clara City Hall:  https://www.sccgov.org/sites/rec/Marriage%20Licenses/Pages/Applying-for-a-Marriage-License.aspx

 

Everything about Python Dict

This is a  great post I found on stackoverflow.

http://stackoverflow.com/questions/327311/how-are-pythons-built-in-dictionaries-implemented

 

Here is everything about Python dicts that I was able to put together (probably more than anyone would like to know; but the answer is comprehensive).

  • Python dictionaries are implemented as hash tables.
  • Hash tables must allow for hash collisions i.e. even if two distinct keys have the same hash value, the table’s implementation must have a strategy to insert and retrieve the key and value pairs unambiguously.
  • Python dict uses open addressing to resolve hash collisions (explained below) (see dictobject.c:296-297).
  • Python hash table is just a contiguous block of memory (sort of like an array, so you can do an O(1) lookup by index).
  • Each slot in the table can store one and only one entry. This is important.
  • Each entry in the table actually a combination of the three values: < hash, key, value >. This is implemented as a C struct (see dictobject.h:51-56).
  • The figure below is a logical representation of a Python hash table. In the figure below, 0, 1, ..., i, ... on the left are indices of the slots in the hash table (they are just for illustrative purposes and are not stored along with the table obviously!).
    # Logical model of Python Hash table
    -+-----------------+
    0| <hash|key|value>|
    -+-----------------+
    1|      ...        |
    -+-----------------+
    .|      ...        |
    -+-----------------+
    i|      ...        |
    -+-----------------+
    .|      ...        |
    -+-----------------+
    n|      ...        |
    -+-----------------+
  • When a new dict is initialized it starts with 8 slots. (see dictobject.h:49)
  • When adding entries to the table, we start with some slot, i, that is based on the hash of the key. CPython initially uses i = hash(key) & mask (where mask = PyDictMINSIZE - 1, but that’s not really important). Just note that the initial slot, i, that is checked depends on the hash of the key.
  • If that slot is empty, the entry is added to the slot (by entry, I mean, <hash|key|value>). But what if that slot is occupied!? Most likely because another entry has the same hash (hash collision!)
  • If the slot is occupied, CPython (and even PyPy) compares the the hash AND the key (by compare I mean == comparison not the is comparison) of the entry in the slot against the key of the current entry to be inserted (dictobject.c:337,344-345). If both match, then it thinks the entry already exists, gives up and moves on to the next entry to be inserted. If either hash or the key don’t match, it starts probing.
  • Probing just means it searches the slots by slot to find an empty slot. Technically we could just go one by one, i+1, i+2, ... and use the first available one (that’s linear probing). But for reasons explained beautifully in the comments (see dictobject.c:33-126), CPython uses random probing. In random probing, the next slot is picked in a pseudo random order. The entry is added to the first empty slot. For this discussion, the actual algorithm used to pick the next slot is not really important (see dictobject.c:33-126 for the algorithm for probing). What is important is that the slots are probed until first empty slot is found.
  • The same thing happens for lookups, just starts with the initial slot i (where i depends on the hash of the key). If the hash and the key both don’t match the entry in the slot, it starts probing, until it finds a slot with a match. If all slots are exhausted, it reports a fail.
  • BTW, the dict will be resized if it is two-thirds full. This avoids slowing down lookups. (see dictobject.h:64-65)

NOTE: I did the research on Python Dict implementation in response to my own question about how multiple entries in a dict can have same hash values. I posted a slightly edited version of the response here because all the research is very relevant for this question as well.

Everything about Cache

I’ll put every thing I read about cache in this article

Write-through, write-around and write-back cache

There are three main caching techniques that can be deployed, each with their own pros and cons.

  • Write-through cache directs write I/O onto cache and through to underlying permanent storage before confirming I/O completion to the host. This ensures data updates are safely stored on, for example, a shared storage array, but has the disadvantage that I/O still experiences latency based on writing to that storage. Write-through cache is good for applications that write and then re-read data frequently as data is stored in cache and results in low read latency.
  • Write-around cache is a similar technique to write-through cache, but write I/O is written directly to permanent storage, bypassing the cache. This can reduce the cache being flooded with write I/O that will not subsequently be re-read, but has the disadvantage is that a read request for recently written data will create a “cache miss” and have to be read from slower bulk storage and experience higher latency.
  • Write-back cache is where write I/O is directed to cache and completion is immediately confirmed to the host. This results in low latency and high throughput for write-intensive applications, but there is data availability exposure risk because the only copy of the written data is in cache. As we will discuss later, suppliers have added resiliency with products that duplicate writes. Users need to consider whether write-back cache solutions offer enough protection as data is exposed until it is staged to external storage. Write-back cache is the best performing solution for mixed workloads as both read and write I/O have similar response time levels.

LRU implementation:

http://openmymind.net/High-Concurrency-LRU-Caching/

1.Hashmap:

A read-write mutex for the hashtable is efficient. Assuming that we are GETing more than we are SETting, we’ll mostly be acquiring read-locks (which multiple threads can secure). The only time we need a write lock is when setting an item. If we keep things basic, it ends up looking like:

If necessary, we could always shard our hashtable to support more write throughput.

http://openmymind.net/Shard-Your-Hash-table-to-reduce-write-locks/

2.List  Ultimately, our goal was to reduce lock contention against our list. We achieved this in three ways. First, we use a window to limit the frequency of promotion. Second, we use a buffered channel to process promotions in a separate thread. Finally, we can do promotion and GC within the same thread.

(See this implementation by Ebay engineering blog. similar to the second option above. http://www.ebaytechblog.com/2011/08/30/high-throughput-thread-safe-lru-caching/ )

http://openmymind.net/Back-To-Basics-Hasthables-Part-2/ :    Back To Basics: Hashtables Part 2 (And A Peek Inside Redis).  This article introduce the incremental rehashing in Redis. An interesting idea.   Summary:  The magic of keeping hashtable efficient, despite varying size, is not that different than the magic behind dynamic arrays: double in size and copy. Redis’ incremental approach ensures that rehashing large hasthables doesn’t cause any performance hiccups. The downside is internal complexity (almost every hashtable operation needs to be rehashing-aware) and a longer rehashing process (which takes up extra memory while running).  Read about Redis’ hashtable implementation: https://github.com/antirez/redis/blob/unstable/src/dict.c