Python 字典基礎回顧

04-21

關鍵詞 python、dict、data struct、python字典、python collections、dafultdict、Counter

Python 中字典是一種基本的數據結構，它將鍵與值聯繫起來，形成的鍵值對形式，讓我們可以通過鍵快速找到對應的值。

在這篇文章的以下內容，可以了解到以下內容：

- Python 字典的基礎用法

- Python 字典的創建

- Python 字典的賦值

- Python 字典的查找

- Python字典作為簡單的數據結構使用

- collections 包的兩個工具使用

- dafaultdict

- Counter

Python 字典的基礎用法

下面將通過 Python 字典的創建，賦值以及查找三個方面介紹 Python 字典的基礎用法，以及最後通過利用 Python 構造一個簡單的複合數據結構。

Python 字典創建

在 Python 中創建字典很簡單，使用 { } 即可創建一個空字典，可以使用 : 來連接鍵和值，然後使用 , 分割多個鍵值對。

# 字典創建empty_dict = {}member = {"Lilei": 16, "Hanmeimei": 17}

Python 字典查詢

Python 中的字典查詢主要有兩種方法，一種是用 [ ] 直接通過鍵名查詢，另一種方法是通過 .get() 來獲取鍵名。

# 查詢# Issue 1, 直接通過鍵名獲取print("Issue 1 : ", member["Lilei"])# Issue 2, 通過 get 獲取print("Issue 2 : ", member.get("Lilei"))# Issue 3, 如果通過 get 獲取的鍵名不存在，返回默認值print("Issue 3 : ", member.get("Mike"))# Issue 4, 可以通過 get 獲取，設置默認值，如果鍵名不存在，返回設置的默認值print("Issue 4 : ", member.get("Mike", 18))>>>>> 以下為程序輸出結果 >>>>>Issue 1 : 16Issue 2 : 16Issue 3 : NoneIssue 4 : 18

Python 字典賦值

Python 字典賦值與 Python 字典查詢類似，可以用 [ ] 來為指定的鍵賦值，如果被指定的鍵不存在，Python 會為你創建一個新的鍵值對。

# 賦值# Issue 1, 直接通過方括弧賦值member["Lilei"] = 18print("Issue 1 : ", member["Lilei"])# Issue 2，通過方括弧為不存在的鍵名創建新值member["Tony"] = 20print("Issue 2 : ", member["Tony"])>>>>> 以下為程序輸出結果 >>>>>Issue 1 : 18Issue 2 : 20

更深入的 Python 查找

在實際應用中，我們可能會嘗試獲取一個不存在的鍵名，這是，Python 會報出 KeyError 的錯誤，我們可以通過 try - except 捕獲異常來處理，此外，我們也可以通過 in 來判斷鍵名是否存在。

# 查找# Issue 1 如果鍵名不存在與字典，會返回 KeyError 錯誤try: mike_member = member["Mike"]except KeyError: print("Issue 1 : Can not found member named: Mike")# Issue 2 可以用 in 查找鍵是否存在print("Issue 2 : Mike in member: ", "Mike" in member)print("Issue 2: Lilei in member: ", "Lilei" in member)>>>>> 以下為程序輸出結果 >>>>>Issue 1 : Can not found member named: MikeIssue 2 : Mike in member: FalseIssue 2: Lilei in member: True

字典作為簡單複合數據結構使用

通常，我們可以使用類和對象表示一個實體，但一些情況下，為了方便，我們也可以通過字典來表示一個實體，以下代碼演示通過字典來實驗一個簡單的 SNS 應用的一條消息實體，其中包含了用戶名、信息內容以及用戶標籤。

weixin = { "user": "Tony", "text": "Python is the best language", "hashtags": ["#python", "#java", "#data"]}# 獲取鍵名print("Key: ", weixin.keys())# 獲取鍵值print("Value: ", weixin.values())# 獲取鍵-值元組列表print("K-V tuple: ", weixin.items())>>>>> 以下為程序輸出結果 >>>>>Key: dict_keys([user, text, hashtags])Value: dict_values([Tony, Python is the best language, [#python, #java, #data]])K-V tuple: dict_items([(user, Tony), (text, Python is the best language), (hashtags, [#python, #java, #data])])

collection 包的兩個工具的使用

我們實際生產中，有很多情景需要統計個數，比如，統計一段文字裡面單詞出現個個數等，這時候，我們可以通過 Python 字典原生的功能實現，但通常情況下，使用 collections 包提供的 defaultdict 和 Counter 工具更為方便。

一下是一段來自維基百科對 Python 介紹的一段文字：

Python is a widely used high-level programming language for general-purpose programming, created by Guido van Rossum and first released in 1991. An interpreted language, Python has a design philosophy that emphasizes code readability notably using whitespace indentation to delimit code blocks rather than curly brackets or keywords, and a syntax that allows programmers to express concepts in fewer lines of code than might be used in languages such as C++ or Java.The language provides constructs intended to enable writing clear programs on both a small and large scale.

我們的目標是統計這一段文字裡面，不同單詞的出現個數。首先，我們需要先對這段文字進行一些處理，先清除標點符號，以及去除空格，將單詞存放到一個字典里。

# 字典計數raw_document = "Python is a widely used high-level programming language for general-purpose programming, created by Guido van Rossum and first released in 1991. An interpreted language, Python has a design philosophy that emphasizes code readability notably using whitespace indentation to delimit code blocks rather than curly brackets or keywords, and a syntax that allows programmers to express concepts in fewer lines of code than might be used in languages such as C++ or Java.The language provides constructs intended to enable writing clear programs on both a small and large scale."# 去標點符號non_punctuation_document = raw_document.replace(",", "").replace(".", "")document = non_punctuation_document.split(" ")

接下來，我們嘗試使用 Python 字典原生的方法來統計個數

# Issue 1，使用字典原生方法統計個數word_counts = {}for word in document: previous_count = word_counts.get(word, 0) word_counts[word] = previous_count + 1print("Issue 1, count the words in document: ", word_counts)>>>>> 以下為程序輸出結果 >>>>>Issue 1, count the words in document: {Python: 2, is: 1, a: 4, widely: 1, used: 2, high-level: 1, programming: 2, language: 3, for: 1, general-purpose: 1, created: 1, by: 1, Guido: 1, van: 1, Rossum: 1, and: 3, first: 1, released: 1, in: 3, 1991: 1, An: 1, interpreted: 1, has: 1, design: 1, philosophy: 1, that: 2, emphasizes: 1, code: 3, readability: 1, notably: 1, using: 1, whitespace: 1, indentation: 1, to: 3, delimit: 1, blocks: 1, rather: 1, than: 2, curly: 1, brackets: 1, or: 2, keywords: 1, syntax: 1, allows: 1, programmers: 1, express: 1, concepts: 1, fewer: 1, lines: 1, of: 1, might: 1, be: 1, languages: 1, such: 1, as: 1, C++: 1, JavaThe: 1, provides: 1, constructs: 1, intended: 1, enable: 1, writing: 1, clear: 1, programs: 1, on: 1, both: 1, small: 1, large: 1, scale: 1}

使用 collections 的 dafaultdict 來統計單詞出現個數

dafaultdict 相當於一個標準的字典，除了當前查找一個沒有包含在內的鍵時，它會通過提供的零參數函數自動建立一個新鍵，並為它的值增加 1，使用 dafaultdict 的方法如下：

# Issue 2, 使用 defaultdict 統計詞個數from collections import defaultdictword_counts = defaultdict(int)for word in document: word_counts[word] += 1print("Issue 2, count the words in document by defaultdict: ", word_counts)>>>>> 以下為程序輸出結果 >>>>>Issue 2, count the words in document by defaultdict: defaultdict(<class int>, {Python: 2, is: 1, a: 4, widely: 1, used: 2, high-level: 1, programming: 2, language: 3, for: 1, general-purpose: 1, created: 1, by: 1, Guido: 1, van: 1, Rossum: 1, and: 3, first: 1, released: 1, in: 3, 1991: 1, An: 1, interpreted: 1, has: 1, design: 1, philosophy: 1, that: 2, emphasizes: 1, code: 3, readability: 1, notably: 1, using: 1, whitespace: 1, indentation: 1, to: 3, delimit: 1, blocks: 1, rather: 1, than: 2, curly: 1, brackets: 1, or: 2, keywords: 1, syntax: 1, allows: 1, programmers: 1, express: 1, concepts: 1, fewer: 1, lines: 1, of: 1, might: 1, be: 1, languages: 1, such: 1, as: 1, C++: 1, JavaThe: 1, provides: 1, constructs: 1, intended: 1, enable: 1, writing: 1, clear: 1, programs: 1, on: 1, both: 1, small: 1, large: 1, scale: 1})

我們可以看到，使用 defaultdict 代碼量會比直接使用字典簡單，而且輸出的結果是一樣的。

使用 collections 的 Counter 來統計單詞數目

除了統計單詞數目外，我們在實際中可能更需要經過篩選處理的結果，這裡我們使用 Counter 可以列出單詞出現個數排名前十的單詞及其出現的次數，具體代碼如下：

# Issue 3，使用 Counter 統計詞個數from collections import Counterword_counts = Counter(document)for word, count in word_counts.most_common(10): print("Issue 3, most common word in documents: ", word, count)>>>>> 以下為程序輸出結果 >>>>>Issue 3, most common word in documents: a 4Issue 3, most common word in documents: language 3Issue 3, most common word in documents: and 3Issue 3, most common word in documents: in 3Issue 3, most common word in documents: code 3Issue 3, most common word in documents: to 3Issue 3, most common word in documents: Python 2Issue 3, most common word in documents: used 2Issue 3, most common word in documents: programming 2Issue 3, most common word in documents: that 2

總結

通過這篇文章，我們回顧了 Python 字典的基本用法，之後通過一個簡單的實例，嘗試了使用 collections 提供的 defaultdict 以及 Counter 包，了解如何通過字典來統計數目。

參考資料

[1] Joel Grus. 數據科學入門(第2章 Python速成) [978-7-115-41741-1].人民郵電出版社