stl的sort和手寫快排的運行效率哪個比較高？

01-13

sort函數具體的實現是怎麼樣的？在遞歸進入一定深度的情況下變為堆排序？（只是猜測）
請問二者在什麼情況下哪一個的運行效率比較高？

====== 11.11 日更新

今天上午來用Google的 benchmark 庫測了一下這幾個排序的演算法的漸進複雜度係數，環境是在 VBox debian 虛擬機里，分了兩個核。

這是插入排序的漸進複雜度是 0.16 N^2，以及係數的標準差是 0.07

這是歸併排序的漸進複雜度是 11.58 NlgN，以及係數的標準差是 0.03

這是快速排序的漸進複雜度是 6.98 NlgN，以及係數的標準差是 0.02

這是std::sort 排序的漸進複雜度是 4.87 NlgN，以及係數的標準差是 0.02

可以看出 std::sort 擁有更小的漸進複雜度係數================== 原答案

首先，STL的sort 不僅僅是快排，所以你只是手寫快排，哪怕是尾遞歸式的快排，也不會有它快。

它會先走快排主遞歸邏輯，但是遞歸深度是有限的，而且當子區間夠小時，會走插入排序，我摘了標準庫里幾行代碼，具體直接看代碼就可以了。

std::sort

template& inline void sort(_RandomAccessIterator __first, _RandomAccessIterator __last) { typedef typename iterator_traits&<_RandomAccessIterator&>::value_type _ValueType;


      // concept requirements

      __glibcxx_function_requires(_Mutable_RandomAccessIteratorConcept&<
	    _RandomAccessIterator&>)

      __glibcxx_function_requires(_LessThanComparableConcept&<_ValueType&>)

      __glibcxx_requires_valid_range(__first, __last);

if (__first != __last) { std::__introsort_loop(__first, __last, std::__lg(__last - __first) * 2); std::__final_insertion_sort(__first, __last); } }

STL 里快排尾遞歸主邏輯

/// This is a helper function for the sort routine. template& void __introsort_loop(_RandomAccessIterator __first, _RandomAccessIterator __last, _Size __depth_limit) { while (__last - __first &> int(_S_threshold)) { if (__depth_limit == 0) { _GLIBCXX_STD_A::partial_sort(__first, __last, __last); return; } --__depth_limit; _RandomAccessIterator __cut = std::__unguarded_partition_pivot(__first, __last); std::__introsort_loop(__cut, __last, __depth_limit); __last = __cut; } }

部分排序中使用堆排序

template& inline void partial_sort(_RandomAccessIterator __first, _RandomAccessIterator __middle, _RandomAccessIterator __last) { typedef typename iterator_traits&<_RandomAccessIterator&>::value_type _ValueType;


      // concept requirements

      __glibcxx_function_requires(_Mutable_RandomAccessIteratorConcept&<
	    _RandomAccessIterator&>)

      __glibcxx_function_requires(_LessThanComparableConcept&<_ValueType&>)

      __glibcxx_requires_valid_range(__first, __middle);

      __glibcxx_requires_valid_range(__middle, __last);

std::__heap_select(__first, __middle, __last); std::sort_heap(__first, __middle); }

快排中的partion，這裡用的是三分取中

template& inline _RandomAccessIterator __unguarded_partition_pivot(_RandomAccessIterator __first, _RandomAccessIterator __last) { _RandomAccessIterator __mid = __first + (__last - __first) / 2; std::__move_median_first(__first, __mid, (__last - 1)); return std::__unguarded_partition(__first + 1, __last, *__first); }

template& _RandomAccessIterator __unguarded_partition(_RandomAccessIterator __first, _RandomAccessIterator __last, const _Tp __pivot) { while (true) { while (*__first &< __pivot) ++__first; --__last; while (__pivot &< *__last) --__last; if (!(__first &< __last)) return __first; std::iter_swap(__first, __last); ++__first; } }

最後走優化的插入排序

template& void __final_insertion_sort(_RandomAccessIterator __first, _RandomAccessIterator __last) { if (__last - __first &> int(_S_threshold)) { std::__insertion_sort(__first, __first + int(_S_threshold)); std::__unguarded_insertion_sort(__first + int(_S_threshold), __last); } else std::__insertion_sort(__first, __last); }

另外，你可以看一下我寫的幾個排序演算法和STL里的 sort 的對比數據：

用例中綠色區域的快速排序：

void swap(int *array, int pos_i, int pos_j) { int tmp = array[pos_i]; array[pos_i] = array[pos_j]; array[pos_j] = tmp; } int partition(int *array, int begin, int end); // 尾遞歸版這個可以減少棧空間的使用 void quick_sort_tail(int *array, int begin, int end) { while (begin &< end) { int r = partition(array, begin, end); quick_sort_tail(array, begin, r - 1); begin = r + 1; } } int partition(int *array, int begin, int end) { // 在 begin 到 end 間隨機找一個數當劃分用的 key int rand_location = rand() % (end - begin + 1) + begin; int key = array[rand_location]; // 把這個 key 移到 end 的位置，方便做雙下標前移動 swap(array, rand_location, end); int i = begin - 1; for (int j = begin; j &< end; ++j) { if (array[j] &<= key) { i++; swap(array, i, j); } } swap(array, i + 1, end); return i + 1; }

如果有人手寫的快排任何情況下都比STL sort快，那這個寫出來的程序就會成為新的STL sort……

STL的sort必然要比你自己寫的快排要快，因為你自己手寫一個這麼複雜的sort，那就太閑了。STL的sort是盡量讓複雜度維持在O(N log N)的，因此就有了各種的Hybrid sort algorithm。

題主你提到的先quicksort到一定深度之後就轉為heapsort，這種是introsort。

每種STL實現使用的演算法各有不同，GNU Standard C++ Library的實現就是先introsort，然後再來個insertion sort。

References:

sort (C++)
Introsort
libstdc++: Sorting Algorithms

我自己寫的MSVC中std::sort源碼的分析，可以參考一下

源碼閱讀筆記 - 1 MSVC2015中的std::sort

用STL。

1. C++11起，規定std::sort 的複雜度nlogn，包括worst case。這是GCC源代碼：https://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a01499_source.html 從 1960到 1971行。可以看到，這裡實現用的是：

- 先用introsort，這個實際上是用來控制quick sort的分支。

- 分枝數達到2logn 之後，對每個分支使用NlogN的演算法。GCC這裡用的是insertion sort。

想自己折騰一個比這更好輪子？浪費那時間幹嘛。

2. 把問題上升一點，一般情況下能用STL就不要自己用手，特別是手動for loop。推薦看一下Scott Meyer這篇文章：http://www.drdobbs.com/stl-algorithms-vs-hand-written-loops/184401446。

總結一下文章表述的原因：

- STL優化好，編譯器一般情況下inline 的好；而且能省掉一些重複計算。比如文章里對list進行sort 的例子，手動for loop裡面很容易不小心就每次循環調用end()函數。如果用list的sort，end()就只調用一次。

- STL的演算法一般都是經過高智商的人精心寫出來的。除非你真想challenge，成功了估計你也該寫proposal進standard了。

- STL的演算法對所有容器類型的特點都進行了考慮，優化。想手動做的像STL一樣對vector，deque，list等等都做到最優，費時費力。

個人覺得很重要的一點是，STL的東西是范型的。比如std::sort，你只要是滿足random access的容器，任何數據類型都可以。再者，可以用不同比較函數，C++11起引入了lambda expression，讓手動DIY各種比較函數非常容易。如果你想自己手動寫一個既范型，又優化好的演算法，那你其實終究是在重複STL了。

STL的qsort在partition，遞歸，第一深度方面都做了優化。小數據下自己寫和用STL基本沒有區別。但是數據量一大，STL的優勢就顯示出來了。

誰手寫，你？

歪個樓...

手寫的基數排序會比std::sort快呀～

前幾天突發奇想把500萬個longlong按2^k進位進行了基數排序。

當256進位以上就比stl的sort 快了

測試環境是vs2015/release

?(? ? ?ω? ? ?)?

好吧。。我把關鍵代碼也貼上來。。

GitHub - BigVan/Toys: Some interesting code

template& class Radix { public: Radix(const vector& _nums) : p(0), max_num(-(1&<&<31)) { bit = (1 &<&< digit) - 1; cnt = 1; for (auto num : _nums) { max_num = max(num, max_num); g[p][num bit].push_back(num); } }; vector& sort() {


		_LONGLONG last = digit;

		cnt++;

		while (max_num &> (_LONGLONG)1&<&< (_LONGLONG)last) {
			for (int i = 0; i &< 1&<&&> last)  bit;

					g[1 - p][idx].push_back(num);

				}

				g[p][i].clear();

			}

			last += digit;

			p = 1 - p;

		}

		vector& ans{};

		for (auto it : g[p])

			ans.insert(ans.end(), it.begin(), it.end());

		return ans;

	}
private:

	_LONGLONG p, bit, cnt;

	T max_num;

	vector& g[2][1&<&

前兩年寫過一個簡單的，在數據量小的時候，手寫比stl確實可能快很多，嗯嗯，近一個數量級吧

 
  推薦閱讀：


※C++輸出hello world，請從電子電路、內存CPU、程序層面解釋一下？
※如何在一個月內提高C++水平?
※如何說服同學在寫C++程序的時候用cstdio而不是stdio.h？
※當你寫幾萬行代碼時，你在寫什麼？
※有什麼現有函數，輸入是一個單詞變式的字元串，如複數，過去式，現在進行時等，輸出這個單詞的原型？
TAG:演算法 | 編程 | STL |