HashMap的doubleCapacity()方法這樣寫妙在哪？

01-13

Android環境下的HashMap類中，擴容方法doubleCapacity()是這樣寫的：
private HashMapEntry&[] doubleCapacity() { HashMapEntry&[] oldTable = table; int oldCapacity = oldTable.length; if (oldCapacity == MAXIMUM_CAPACITY) { return oldTable; } int newCapacity = oldCapacity * 2; HashMapEntry&[] newTable = makeTable(newCapacity); if (size == 0) { return newTable; }
for (int j = 0; j &< oldCapacity; j++) { /* * Rehash the bucket using the minimum number of field writes. * This is the most subtle and delicate code in the class. */ HashMapEntry& e = oldTable[j]; if (e == null) { continue; } int highBit = e.hash oldCapacity; HashMapEntry& broken = null; newTable[j | highBit] = e; for (HashMapEntry& n = e.next; n != null; e = n, n = n.next) { int nextHighBit = n.hash oldCapacity; if (nextHighBit != highBit) { if (broken == null) newTable[j | nextHighBit] = n; else broken.next = n; broken = e; highBit = nextHighBit; } } if (broken != null) broken.next = null; } return newTable; }

擴容後重新哈希的代碼中，計算index不是像put中的那樣使用hash newCapacity - 1，而是使用hash oldCapacity得到hightBit,再用hightBit | j得到index。二者的結果相同，但是計算步驟上反而是後者多了一步或運算。這樣寫的好處在哪裡？
（Android下的HashMap由谷歌重新優化，與JDK的實現不同，打上Android系統源碼的標籤好像也不太對……嘛算了）

謝邀，Android這麼玩，是基於一個事實：HashMap的長度是2的整數次冪。

$capacity=2^{n}$

HashMap每次擴容，也是長度翻倍：

int newCapacity = oldCapacity * 2;

這樣的好處是，可以非常簡單地用 $capacity -1$ 作為掩碼獲得數組下標。（具體參見另一個回答：JDK 源碼中 HashMap 的 hash 方法原理是什麼？ - 胖胖的回答）。以原始長度16為例：

10100101 11000100 00110101 00000000 00000000 00001111 //掩碼=16-1 ---------------------------------- 00000000 00000000 00000101 //高位全部歸零，只保留末四位

所以，每次擴容，數組的下標的變化其實很微妙：

只有前面加了一位，後面幾位保持不變。

比如長度16擴容到了32：

10100101 11000100 00110101 00000000 00000000 00011111 //掩碼=32-1 ---------------------------------- 00000000 00000000 00010101 //只是第5位變化了

只有低位第5位是可能變化的。

所以真實情況是：只有一半的元素需要搬家。

這個前提下，再來看JDK里HashMap擴容，是不是有點太老實了。transfer( )方法兩層迭代，把Entry一個個搬家。這裡重新取下標indexFor( )方法就是題主說的：hash (newCapacity - 1)。

/** * Transfers all entries from current table to newTable. */ void transfer(Entry[] newTable, boolean rehash) { int newCapacity = newTable.length; for (Entry& e : table) { while(null != e) { Entry& next = e.next; if (rehash) { e.hash = null == e.key ? 0 : hash(e.key); } int i = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; } } }

/** * Returns index for hash code h. */ static int indexFor(int h, int length) { return h (length-1); }

Android搞了一個嫁接：

int highBit = e.hash oldCapacity;

以oldCapacity為掩碼，highBit單獨把原先散列值里會變化的那一位切出來。還是以16位到32位擴容為例：

10100101 11000100 00110101 00000000 00000000 00010000 //掩碼=16 ---------------------------------- 00000000 00000000 00010000 //highBit只保留第5位

highBit把第5位單獨切出來，而 $j$ 代表原來的後4位散列值。嫁接到一起（用「|」或操作完成），正好是新的5位散列值。

newTable[j | highBit] = e;

這個做法的優點在於，他是把整棵樹的引用一起搬過去的。在碰撞啊比較嚴重的情況下（這棵樹比較大），再以next子節點高位highBit來判斷，next指向的子樹需不需要搬家。50%的概率是不需要搬的。

if (nextHighBit != highBit) { if (broken == null) newTable[j | nextHighBit] = n; else broken.next = n; broken = e; highBit = nextHighBit; }

像HashMap這種每天要被用無數次的基礎設施，微小的優化都會為世界節省很多時間。