10. Regular Expression Matching

02-10

本題是 Leetcode Top 100 liked questions 中的第五題。

10. Regular Expression Matching

Hard

Implement regular expression matching with support for . and *.. Matches any single character. * Matches zero or more of the preceding element.The matching should cover the entire input string (not partial).The function prototype should be: bool isMatch(const char *s, const char *p)Some examples:isMatch("aa","a") → falseisMatch("aa","aa") → trueisMatch("aaa","aa") → falseisMatch("aa", "a*") → trueisMatch("aa", ".*") → trueisMatch("ab", ".*") → trueisMatch("aab", "cab") → true

題意

如果你熟悉正則表達式的話會對題目很熟悉，其實就是讓你實現正則表達式里的一個功能。判斷帶有.,*的正則模板能否匹配目標字元串。

分析

因為Python裡面自帶正則表達式re庫，用這個庫就可以很輕易的滿足題目要求。但這樣做顯然體現不了這道題目的價值，就讓我們試著自己去實現這種正則判斷。

首先分析一下正則匹配的流程，正則表達式的大致匹配過程是：

依次拿出表達式和文本中的字元比較。
如果每一個字元都能匹配，則匹配成功；一旦有匹配不成功的字元則匹配失敗。
如果表達式中有量詞或邊界，這個過程會稍微有一些不同。

我們的題目當中只涉及到.與*的特殊匹配，我們用s代表字元串，p代表用來匹配的正則表達式。因為是依次拿取字元進行正則匹配，所以後面的匹配成功依賴於前面的字元得到成功的匹配(但是具體依賴於前面怎樣的匹配方式我們不確定)，所以不妨用dp[i][j]來表示前i個字元被前p個表達式匹配的結果。

當我們對s[i]和p[j]進行匹配時（此時認為i,j就是最最後需要匹配的項，不用考慮p[j]後面是不是有*）,想要匹配成功，只有以下幾種基本情況：

p[j]==s[i]，兩個字元一樣,匹配成功dp[i][j]=dp[i-1][j-1]。
p[j]==.， .可以匹配一切字元沒問題。此時dp[i][j]=dp[i-1][j-1]
p[j]==*，這時候就要分情況討論了，主要有兩種基本情況：

p[j-1]!=s[i]:dp[i][j]=dp[i][j-2],此時因為 * 的前一項與s[i]不匹配，所以我們把*和它前面的一項當做空值來進行匹配。
p[j-1]==s[i] or p[j-1]==*: 即p[j-1]能匹配上s[i]。那麼dp[i][j]=dp[i-1][j] (此時 * 代表重複大於一次) or dp[i][j]=dp[i][j-1] (此時 * 代表只重複一次) or dp[i][j]=dp[i][j-2] (此時 * 代表重複0次)

這種此刻狀態依賴於前面的細分狀態時所用的演算法叫做DP演算法。

代碼實現

#偷懶用re做正則判斷class Solution(object): def isMatch(self, s, p): """ :type s: str :type p: str :rtype: bool """ return True if re.match(p,s,re.S) and re.match(p,s,re.S).group()==s else False#DP Solution from @cwlclass Solution(object): def isMatch(self, s, p): # The DP table and the string s and p use the same indexes i and j, but # table[i][j] means the match status between p[:i] and s[:j], i.e. # table[0][0] means the match status of two empty strings, and # table[1][1] means the match status of p[0] and s[0]. Therefore, when # refering to the i-th and the j-th characters of p and s for updating # table[i][j], we use p[i - 1] and s[j - 1]. # Initialize the table with False. The first row is satisfied. table = [[False] * (len(s) + 1) for _ in range(len(p) + 1)] # Update the corner case of matching two empty strings. table[0][0] = True # Update the corner case of when s is an empty string but p is not. # Since each * can eliminate the charter before it, the table is # vertically updated by the one before previous. [test_symbol_0] for i in range(2, len(p) + 1): table[i][0] = table[i - 2][0] and p[i - 1] == * for i in range(1, len(p) + 1): for j in range(1, len(s) + 1): if p[i - 1] != "*": # Update the table by referring the diagonal element. table[i][j] = table[i - 1][j - 1] and (p[i - 1] == s[j - 1] or p[i - 1] == .) else: # Eliminations (referring to the vertical element) # Either refer to the one before previous or the previous. # I.e. * eliminate the previous or count the previous. # [test_symbol_1] table[i][j] = table[i - 2][j] or table[i - 1][j] # Propagations (referring to the horizontal element) # If ps previous one is equal to the current s, with # helps of *, the status can be propagated from the left. # [test_symbol_2] if p[i - 2] == s[j - 1] or p[i - 2] == .: table[i][j] |= table[i][j - 1] return table[-1][-1]class TestSolution(unittest.TestCase): def test_none_0(self): s = "" p = "" self.assertTrue(Solution().isMatch(s, p)) def test_none_1(self): s = "" p = "a" self.assertFalse(Solution().isMatch(s, p)) def test_no_symbol_equal(self): s = "abcd" p = "abcd" self.assertTrue(Solution().isMatch(s, p)) def test_no_symbol_not_equal_0(self): s = "abcd" p = "efgh" self.assertFalse(Solution().isMatch(s, p)) def test_no_symbol_not_equal_1(self): s = "ab" p = "abb" self.assertFalse(Solution().isMatch(s, p)) def test_symbol_0(self): s = "" p = "a*" self.assertTrue(Solution().isMatch(s, p)) def test_symbol_1(self): s = "a" p = "ab*" self.assertTrue(Solution().isMatch(s, p)) def test_symbol_2(self): # E.g. # s a b b # p 1 0 0 0 # a 0 1 0 0 # b 0 0 1 0 # * 0 1 1 1 s = "abb" p = "ab*" self.assertTrue(Solution().isMatch(s, p))if __name__ == "__main__": unittest.main()

Python Tips

這個題目用到的經典演算法叫做DP演算法（Dynamic Programming,俗稱動態規劃）。DP演算法的展開需要很大的篇幅。推薦一系列blog《動態規劃（DP）初步》。

題目所涉及的正則表達式是一種強大的字元串匹配工具，適用於多種編碼語言，環境。玩爬蟲的時候經常要用到正則去匹配想去下載的內容或者網址。下面這張圖片是正則匹配的一些基本規則。至於python的re庫的用法，內容比較多，容我挖個坑。