位運算生成按二進位中1個數排序的序列

08-17

位運算生成按二進位中1個數排序的序列

來自專欄 Unwinding the stack5 人贊了文章

問題

在一些動態規劃題目中，我們常常需要按二進位表示中1的個數的順序遍歷所有小於$2^N$的數字。例如，$N=6$時，一種可行的遍歷順序就是：

0 1 2 4 8 16 32 3 5 6 9 10 12 17 18 20 24 33 34 36 40 48 7 11 13 14 19 21 22 25 26 28 35 37 38 41 42 44 49 50 52 56 15 23 27 29 30 39 43 45 46 51 53 54 57 58 60 31 47 55 59 61 62 63

在這種情況下，具有相同個1的數字順序可以是任意的。終止條件是生成了一個>= $2^N$的數字。

我們希望找到一個next方法，來生成我們所需要的序列：

unsigned next(unsigned cur, unsigned max);int main() { for (int i=0; i< 1<<6; i = next(i, 1 << 6)) { cout << i << " "; }}

當然，我們可以預先生成出所有的小於max的數字然後按1的個數排序，但這樣需要額外的空間，效率也不是最優的。

解法

首先考慮一種普通的情況：對於擁有相同個數的1的數字，我們需要確定一種方便的方法確定它們的順序。可以把每個數看作0/1的一個置換，尋找下一個數就是尋找下一個置換。

如果當前所有的1都在最高位，那麼下一個數就是在最低位放上當前1的個數+1個1。例如：48 = 110000，下一個就是 7 = 000111。注意當前0也算作1都在最高位。

怎麼判斷1都在最高位呢？用 ((cur - 1) | cur) >= max - 1。(cur - 1) | cur會將最低位1之後的位全部置為1。對於大於0的、1都在最高位的數字，結果就是max - 1。對於0，結果會是全1。其餘情況會得到一個小於max - 1的數字。因此在這裡可以用>=判斷。

根據以上情況，我們可以寫出一個大致的框架：

unsigned next(unsigned cur, unsigned max) { if (((cur - 1) | cur) >= max - 1) return (1U << (__builtin_popcount(cur) + 1)) -1; return next_perm(cur);}

接下來的問題就是如何快速計算出next_perm。這種常用的位運算技巧，大多數都可以在Bit Twiddling Hacks

這個頁面找到答案（這個解法比網上流傳的用除法的版本儘管多了幾個op，但都屬於較為廉價的操作，在個人PC上會快1倍以上）。直接貼出結果。注意這個對於相同位數的輸出還保有了遞增的順序：

unsigned next_perm(unsigned v) { unsigned int t = v | (v - 1); // t gets vs least significant 0 bits set to 1 // Next set to 1 the most significant bit to change, // set to 0 the least significant ones, and add the necessary 1 bits. return (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));}

簡單介紹一下原理：next_perm需要找出最低的01這個組合，然後把它置為10，之後把其右邊的所有1移到最低位。注意到最低的01組合右邊的位一定是先有若干個1再有若干個0。右邊的1的個數 = 右邊的位數 - 末尾0的個數。

v = 0100110 // to find the least 01t = 0100111 // to find the least 01~t= 1011000 // to find the least 10~t & -~t = 0001000 // the `1` bit of least 10((~t & -~t) - 1) = 0000111 // all `1` on the right of `01`(((~t & -~t) - 1) >> (__builtin_ctz(v) + 1)) = 0000001 // the number of `1` = all right `1`s - trailing `0`s. lo bitst + 1 = 0101000 // bits higher than and equal to `01` in `v`(t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1)) = 0101001 // final results

最終代碼如下：

static unsigned next_perm(unsigned int v){ unsigned int t = v | (v - 1); // t gets vs least significant 0 bits set to 1 // Next set to 1 the most significant bit to change, // set to 0 the least significant ones, and add the necessary 1 bits. return (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));}static unsigned next(unsigned int cur, unsigned int max) { // 這裡和next_perm中重複計算的cur | (cur - 1)會被編譯器自動在內聯時提取出來，不會額外計算一次 return ((cur - 1) | cur) >= max - 1 ? (1U << (__builtin_popcount(cur) + 1)) - 1 : next_perm(cur);}int main() { constexpr unsigned w = 1 << 6; for (volatile unsigned i = 0; i < w; i = next(i, w));}

性能測試

在quick-bench上做了一個簡單的性能測試，結果見[這裡](http://quick-bench.com/7zsyhE4dX_YwwKakZg2h06FJ3p4)

對於<2^30的數可以在4s左右跑完。

展望

是否存在能夠徹底消除這個ifbranch的解法？