初級演算法—字元串

05-14

刷完leetcode初級演算法中的字元串，寫一下解題的思路和其中主要的演算法~

解題方法：

雙指針；
哈希表；
KMP演算法（用於字元串匹配）；
...

雙指針和哈希表前面已經介紹過了，參考鏈接~

Fei-Fei：初級演算法—數組?

zhuanlan.zhihu.com

字元串匹配：

所謂字元串匹配就是指：在目標字元串中找到與模板字元串匹配的位置

如：目標字元串：「aaabaabbabb」

模板字元串：「abaa」

那麼字元串匹配的結果就是「aaabaabaabb」（加粗部分），返回目標目標字元串中與模板字元串相匹配的第一個字元的下標：2

解法一：暴力搜索（brute force search）

做法：使用雙指針的方法，遍歷目標字元串和模板字元串，如果失配（匹配失敗）：將目標字元串和模板字元串的「指針」都回退（如圖）：

brute force search

代碼：

int BruteForceSearch(string haystack, string needle) { // haystack 為目標字元串 // needle 為模板字元串 if ( needle.empty() ) return 0; int size1 = haystack.size(); int size2 = needle.size(); if ( size1 < size2 ) return -1; // 暴力搜索 BF // i 為目標字元串的「指針」 // j 為模板字元串的「指針」 int j = 0; for ( int i=0; i<size1; i++ ) { while ( i<size1 && j<size2 && haystack[i]==needle[j] ) { // 如果匹配上則查看目標字元串和模板字元串的 // 下個字元是否可以匹配的上 j++; i++; } if ( j==size2 && i<=size1 ) { // 如果「指針」 j 到模板字元串尾端，則匹配成功 return i-j; } else if ( i==size1 && j<size2 ) { // 目標字元串已經遍歷完，沒有找到相匹配的字元串 return -1; } else { // 失配，這i,j回退 i = i - j; j = 0; } } return -1;

總結： 假如目標字元串的長度為 $n$ ，模板字元串的長度為 $m$ ，則暴力搜索演算法的時間複雜度為 $O(nm)$ 。暴力搜索有一個缺點就是失配的情況下， $i$ 和 $j$ 都要回退。KMP利用已經匹配上的字元串存在的一些特徵，使得 $i$ 不回退， $j$ 適當回退。

解法二： KMP演算法（The Knuth-Morris-Pratt Algorithm）

Knuth-Morris-Pratt 字元串查找演算法，簡稱為「KMP演算法」，常用於在一個目標文本串T 內查找一個模式串P 的出現位置，這個演算法由Donald Knuth、Vaughan Pratt、James H. Morris三人於1977年聯合發表，故取這3人的姓氏命名此演算法。

做法：目標字元串T遍歷到 $i$ 模板字元串遍歷到 $j$ 時，兩者失配，此時 $i$ 不動， $j$ 適當回退。

那麼該回退多少呢？

根據模板字元串與目標字元串中「ABCAB」是相互匹配的，並且在「ABCAB」中前綴「AB」（代表模板字元串 $0:2$ 面字元）和後綴「AB」（代表目標字元串 $i$ 前2個字元）是一樣（前綴後綴的公共長度為2）。也就是我們只要將 $j$ 退回到「C」位置處就好。（如下圖所示）

ABCABA 的頭部有 A, AB, ABC, ABCA, ABCAB（不包含最後一個字元。稱之為「前綴」）ABCABA 的尾部有 A, BA, ABA, CABA, BCABA（不包含第一個字元。稱之為「後綴」）

KMP演算法

KMP演算法的步驟：

尋找模板字元串的前綴後綴公共長度：

對於模板字元串 P :

$P = P_{0}, P_{1},...P_{j-1},P_{j}$

如果存在：

$P_{0},P_{1}, ...P_{k} = P_{j-k},...P_{j-1}, P_{j}$

則公共前後綴的長度為 $k+1$

舉個例子，如果給定的模式串為「abab」，那麼它的各個子串（子串：「a」「ab」」aba「」abab「）的前綴後綴的公共元素的最大長度如下表格所示：

模板字元串的公共前後綴長度

以「aba」子串來說（表中的第三列）：前綴： a, ab 後綴： b, ba 所以最大前後綴公共長度為 1

2. 求next數組（prefix表）：

next 數組考慮的是除當前字元外的最長相同前綴後綴，所以通過第①步驟求得各個前綴後綴的公共元素的最大長度後，只要稍作變形即可：將第①步驟中求得的值整體右移一位，然後初值賦為-1，如下表格所示：

next數組

3. 根據next數組與目標字元串進行匹配：

當匹配失敗時，根據next數組，對 $j$ 退回的位置進行確定。然後再從新開始匹配。

註：詳細請見

從頭到尾徹底理解KMP - Chris_z - 博客園?

www.cnblogs.com

KMP演算法代碼：

class Solution {public: int KMPSearch(string haystack, string needle) { if ( needle.empty() ) return 0; int size1 = haystack.size(); int size2 = needle.size(); if ( size1 < size2 ) return -1 // 第二種方法： KMP O(n) + o(m) // 構造前綴表 NEXT 數組 /* int prefix[size2]; prefix[0] = 0; int j = 1; int len = 0; // 生成前綴表/next數組 while ( j<size2 ) { if ( needle[j]==needle[len] ) { len++; prefix[j++] = len; } else { if ( len>0 ) len = prefix[len-1]; else prefix[j++] = len; } } // 將前綴表/next數組右移一位第一位補-1 for ( int i=size2-1; i>0; i-- ) prefix[i] = prefix[i-1]; prefix[0] = -1; // 與目標字元串匹配 j = 0; int i = 0; while ( i<size1 ) { if ( j==size2-1 && haystack[i]==needle[j] ) { // 匹配到返回 return i-j; } if ( j==-1 || haystack[i]==needle[j] ) { // 查看下一個字元是否匹配 i++; j++; } else { // 失配是j回退 j = prefix[j]; } } return -1; }};

題一：顛倒整數

給定一個 32 位有符號整數，將整數中的數字進行反轉。

示例 1:

輸入: 123輸出: 321

示例 2:

輸入: -123輸出: -321

示例 3:

輸入: 120輸出: 21

注意:

假設我們的環境只能存儲 32 位有符號整數，其數值範圍是 [?231, 231 ? 1]。根據這個假設，如果反轉後的整數溢出，則返回 0。

解法： 1、轉成字元串； 2、模十法；

class Solution {public: int reverse(int x) { // 第一種方法：轉換成字元串 O(n) /* stringstream ss; string numString; ss << x; ss >> numString; bool negative = true; if ( numString[0]==- ) { numString.erase(numString.begin()); negative = false; } std::reverse(numString.begin(), numString.end()); ss.clear(); // 一定要清除ss內容 long long out; // long long 防止溢出 ss << numString; ss >> out; if ( out<-pow(2, 31) || out>pow(2, 31)-1 ) return 0; if ( negative ) return out; else return -out; */ // 第二種方法：模十法 long long out = 0; while ( abs(x)>0 ) { int temp = x % 10; out = 10*out + temp; if ( out>pow(2, 31)-1 || out<-pow(2, 31) ) return 0; x = x / 10; } return out; }};

題二：有效的字母異位詞

給定兩個字元串 s 和 t ，編寫一個函數來判斷 t 是否是 s 的一個字母異位詞。

示例 1:

輸入: s = "anagram", t = "nagaram"輸出: true

示例 2:

輸入: s = "rat", t = "car"輸出: false

說明:

你可以假設字元串只包含小寫字母。

進階:

如果輸入字元串包含 unicode 字元怎麼辦？你能否調整你的解法來應對這種情況？

解法： 1、哈希； 2、排序；

class Solution {public: bool isAnagram(string s, string t) { int size1 = s.size(); int size2 = t.size(); if ( size1 != size2 ) return false; // 第一種解法：哈希 /* unordered_map<char, int> hash1; unordered_map<char, int> hash2; for ( int i=0; i<size1; i++ ) { hash1[s[i]]++; hash2[t[i]]++; } for ( int i=0; i<size1; i++ ) { if ( hash1[s[i]] != hash2[s[i]] ) return false; } return true; */ // 第二種解法：排序 O(nlogn) sort(s.begin(), s.end()); sort(t.begin(), t.end()); for ( int i=0; i<size1; i++ ) { if ( s[i]!=t[i] ) return false; } return true; }};

題三：實現strStr()

實現 strStr() 函數。

給定一個 haystack 字元串和一個 needle 字元串，在 haystack 字元串中找出 needle 字元串出現的第一個位置 (從0開始)。如果不存在，則返回 -1。

示例 1:

輸入: haystack = "hello", needle = "ll"輸出: 2

示例 2:

輸入: haystack = "aaaaa", needle = "bba"輸出: -1

說明:

當 needle 是空字元串時，我們應當返回什麼值呢？這是一個在面試中很好的問題。

對於本題而言，當 needle 是空字元串時我們應當返回 0 。這與C語言的 strstr() 以及 Java的 indexOf() 定義相符。

class Solution {public: int strStr(string haystack, string needle) { if ( needle.empty() ) return 0; int size1 = haystack.size(); int size2 = needle.size(); if ( size1 < size2 ) return -1; // 第一種方法：暴力搜索 BF 雙指針 O(mn) /* int j = 0; for ( int i=0; i<size1; i++ ) { while ( i<size1 && j<size2 && haystack[i]==needle[j] ) { j++; i++; } if ( j==size2 && i<=size1 ) { return i-j; } else if ( i==size1 && j<size2 ) { return -1; } else { i = i - j; j = 0; } } return -1; */ // 第二種方法： KMP O(n) + o(m) // 構造前綴表 NEXT 數組 /* int prefix[size2]; prefix[0] = 0; int j = 1; int len = 0; while ( j<size2 ) { if ( needle[j]==needle[len] ) { len++; prefix[j++] = len; } else { if ( len>0 ) len = prefix[len-1]; else prefix[j++] = len; } } for ( int i=size2-1; i>0; i-- ) prefix[i] = prefix[i-1]; prefix[0] = -1; j = 0; int i = 0; while ( i<size1 ) { if ( j==size2-1 && haystack[i]==needle[j] ) { return i-j; } if ( j==-1 || haystack[i]==needle[j] ) { i++; j++; } else { j = prefix[j]; } } return -1; */ // 第二種方法 kmp 另一種寫法 int next[size2+1]; next[0] = -1; int i = 0; int j = -1; while ( i<size2 ) { if ( j==-1 || needle[i]==needle[j] ) { j++; i++; next[i] = j; } else { j = next[j]; } } i = 0; j = 0; while ( i<size1 ) { if ( j==size2-1 && haystack[i]==needle[j] ) return i - j; if ( j==-1 || haystack[i]==needle[j] ) { i++; j++; } else { j = next[j]; } } return -1; }};

題四：數數並說

報數序列是指一個整數序列，按照其中的整數的順序進行報數，得到下一個數。其前五項如下：

1. 12. 113. 214. 12115. 111221

1 被讀作 "one 1" ("一個一") , 即 11。

11 被讀作 "two 1s" ("兩個一"）, 即 21。

21 被讀作 "one 2", "one 1" （"一個二" , "一個一") , 即 1211。

給定一個正整數 n ，輸出報數序列的第 n 項。

注意：整數順序將表示為一個字元串。

示例 1:

輸入: 1輸出: "1"

示例 2:

輸入: 4輸出: "1211"

class Solution {public: string countAndSay(int n) { // 第一種解法遍歷 string s = "1"; int count = 1; while ( count<n ) { int size = s.size(); string temp; int l = 1; for ( int i=0; i<size; i++ ) { if ( i+1<size && s[i+1]==s[i] ) l++; else { temp += to_string(l) + s[i]; //to_string 將int 轉換成 string l = 1; } } s = temp; temp.clear(); count++; } return s; }};

初級演算法—字元串完結！

未完待續~