4. Longest Substring Without Repeating Characters

02-09

本題是 Leetcode Top 100 liked questions 中的第四題。

4. Longest Substring Without Repeating Characters

Hard

There are two sorted arrays nums1 and nums2 of size m and n respectively.

Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).
Example 1:
nums1 = [1, 3] nums2 = [2] The median is 2.0
Example 2:
nums1 = [1, 2] nums2 = [3, 4] The median is (2 + 3)/2 = 2.5

題意

給出兩個排序好數列，綜合兩個數列的所有數，給出中位數的值。要求時間複雜度為O(log(m+n))。

分析

抽象一下，只要能找到聯合列表的第k個值即可。但是題目的難點在於對時間複雜度的要求，因為對m+n長度的排序列表找到一個值的時間複雜度就是O(log(m+n))了，如果用二分法將一列merge到另外一列去，時間複雜會是O(mlog(m+n))。所以拿到題目的時候，我是一頭霧水的。難道分開兩列去用二分法？其實問題的關鍵點在於弄清楚什麼是中點，中點實際上是一個分界點，它左邊的數據要小於等於右邊的，且數目相等。當你想通了這一點，似乎就抓住了題目的核心,對兩個列分別拆分也成了可能。

讓我們對兩個列分別做拆分：

那麼如何確保拆分出來中位數呢？其實只需滿足兩個條件。

len(left_part)==len(right_part)
max(left_part)<=max(right_part)

此時的中位數就是(max(left_part)+min(right_part))/2。將兩個條件表達成式子也就是：

i + j == m - i + n - j (or: m - i + n - j + 1) if n >= m, we just need to set: i = 0 ~ m, j = (m + n + 1)/2 - i
B[j-1] <= A[i] and A[i-1] <= B[j]

接下來我們就可以去應用二分法了。

imin=0,imax=m,i=(imin+imax)/2,j=(m+n+1)/2-i（如果m+n為奇數，我們讓左邊比右邊多一個數）
此時我們滿足了第一個條件，但是第二個條件不一定滿足，可能會遇到下面幾種情況：

B[j-1] <= A[i] and A[i-1] <= B[j]：這正是我們想要的。
B[j-1]>A[i]: 此時當然A[i-1]<B[j]，那麼該如何調整二分法的下一個取值呢？如果將i減少，那麼j就加大，其值也變大，右邊的A[i]的值也變小，這樣下去永遠滿足不了。如果將i增大，那麼B[j-1]繼續縮小，右邊的A[i]卻加大了，這種方法可行。所以此處應當調整imin=i+1,imax不變。 *A[i-1]>B[j]: 此時當然A[i]>B[j-1],那麼該如何調整二分法的下一個取值呢？如果將i減小，A[i-1]將會減少，B[j]將會增大，此方法可行。反之如果將i增大，A[i-1]將會增大，B[j]將會減小，這樣將永遠滿足不了條件。

利用二分法最終我們找到了想要的i，那麼中位數就很容易得出了。

當m+n是奇數的時候，中位數就是max(A[i-1],B[j-1])。
當m+n是偶數的時候，中位數就是(max(A[i-1], B[j-1]) + min(A[i], B[j]))/2。

下面我們來考慮一些邊界情況。即i=0,i=m,j=0,j=n的時候，此時A[i-1],B[j-1],A[i],B[j]有可能不存在。其實很簡單，當不存在的時候不去check這個值就行了，因為必然滿足啊。當然對於邊界情況，我們再算中位數的時候也要進行特殊處理。

所以我們可以把判斷條件改成這樣：

Searching i in [0, m], to find an object i that:(j == 0 or i == m or B[j-1] <= A[i]) and(i == 0 or j == n or A[i-1] <= B[j])where j = (m + n + 1)/2 - i

所以我們遇到的所有情況變成：

(j == 0 or i == m or B[j-1] <= A[i]) and (i == 0 or j == n or A[i-1] <= B[j]) where j = (m + n + 1)/2 - i.這是我們期待的。
i>0 and j<n（這個條件可以省略，i>0時，必然j<n）and B[j-1]>A[i],這個時候要增大i
i<m and j>0 (這個條件可以省略，i>0時，必然j<n) and A[i-1]>B[j],這個時候要減小i

一切迎刃而解。

Python 實現

def median(A, B): m, n = len(A), len(B) if m > n: A, B, m, n = B, A, n, m if n == 0: raise ValueError imin, imax, half_len = 0, m, (m + n + 1) / 2 while imin <= imax: i = (imin + imax) / 2 j = half_len - i if i < m and B[j-1] > A[i]: # i is too small, must increase it imin = i + 1 elif i > 0 and A[i-1] > B[j]: # i is too big, must decrease it imax = i - 1 else: # i is perfect if i == 0: max_of_left = B[j-1] elif j == 0: max_of_left = A[i-1] else: max_of_left = max(A[i-1], B[j-1]) if (m + n) % 2 == 1: return max_of_left if i == m: min_of_right = B[j] elif j == n: min_of_right = A[i] else: min_of_right = min(A[i], B[j]) return (max_of_left + min_of_right) / 2.0

Python Tips

抽絲剝繭，不外如是。沒用到什麼花哨的技巧，此題的精妙在於一步一步的分析。二分法想必大家都很熟悉了，我就不贅述了。

正在搶春運的火車票，要不要開一個篇幅說一說爬蟲呢？我曾經入坑爬蟲很久，都是自己玩，沒上框架，也沒用代理，所以爬蟲效率很低，而且經常被反爬蟲策略所限制，scrapy框架是一個很成熟的開源爬蟲框架，省去了你很多自己造輪子的時間，值得一用的。OREILLY系列也有一本《用python寫網路爬蟲》，陪伴你一步一步造輪子的書，如果有人感興趣，那我就試著去寫一些看。

之前講過map,要不今天來看看reduce，雖然這個函數我目前用的不多。 reduce是python的內資函數，功能很像斐波那契數列。

reduce(function, iterable[, initializer]) Apply function of two arguments cumulatively to the items of iterable, from left to right, so as to reduce the iterable to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). The left argument, x, is the accumulated value and the right argument, y, is the update value from the iterable. If the optional initializer is present, it is placed before the items of the iterable in the calculation, and serves as a default when the iterable is empty. If initializer is not given and iterable contains only one item, the first item is returned. Roughly equivalent to:

def reduce(function, iterable, initializer=None): it = iter(iterable) if initializer is None: try: initializer = next(it) except StopIteration: raise TypeError(reduce() of empty sequence with no initial value) accum_value = initializer for x in it: accum_value = function(accum_value, x) return accum_value

從代碼中就很容易看出reduce函數的功能，reduce函數接收三個參數，第一個參數為一個接收兩個參數的函數，第二個參數是一個迭代對象，第三個參數用來生成初始值。function函數會接收初始值和迭代對象的第一個數值作為參數，函數返回值繼續作為新的參數和迭代對象的第二個值一起再次傳入到function中去，以此類推，直到迭代結束。

介紹完了map函數和reduce函數，那麼如今這麼火的Map/Reduce又是什麼呢？這就說來話長了，我還沒組織好語言嗎，請先移步《用通俗易懂的大白話講解Map/Reduce原理》。