標籤:

Python 字典基礎回顧

關鍵詞 pythondictdata structpython字典python collectionsdafultdictCounter

Python 中字典是一種基本的數據結構,它將聯繫起來,形成的鍵值對形式,讓我們可以通過鍵快速找到對應的值。

在這篇文章的以下內容,可以了解到以下內容:

- Python 字典的基礎用法

- Python 字典的創建

- Python 字典的賦值

- Python 字典的查找

- Python字典作為簡單的數據結構使用

- collections 包的兩個工具使用

- dafaultdict

- Counter

Python 字典的基礎用法

下面將通過 Python 字典的創建,賦值以及查找三個方面介紹 Python 字典的基礎用法,以及最後通過利用 Python 構造一個簡單的複合數據結構。

Python 字典創建

在 Python 中創建字典很簡單,使用 { } 即可創建一個空字典,可以使用 : 來連接鍵和值,然後使用 , 分割多個鍵值對。

# 字典創建empty_dict = {}member = {"Lilei": 16, "Hanmeimei": 17}

Python 字典查詢

Python 中的字典查詢主要有兩種方法,一種是用 [ ] 直接通過鍵名查詢,另一種方法是通過 .get() 來獲取鍵名。

# 查詢# Issue 1, 直接通過鍵名獲取print("Issue 1 : ", member["Lilei"])# Issue 2, 通過 get 獲取print("Issue 2 : ", member.get("Lilei"))# Issue 3, 如果通過 get 獲取的鍵名不存在,返回默認值print("Issue 3 : ", member.get("Mike"))# Issue 4, 可以通過 get 獲取,設置默認值,如果鍵名不存在,返回設置的默認值print("Issue 4 : ", member.get("Mike", 18))>>>>> 以下為程序輸出結果 >>>>>Issue 1 : 16Issue 2 : 16Issue 3 : NoneIssue 4 : 18

Python 字典賦值

Python 字典賦值與 Python 字典查詢類似,可以用 [ ] 來為指定的鍵賦值,如果被指定的鍵不存在,Python 會為你創建一個新的鍵值對。

# 賦值# Issue 1, 直接通過方括弧賦值member["Lilei"] = 18print("Issue 1 : ", member["Lilei"])# Issue 2,通過方括弧為不存在的鍵名創建新值member["Tony"] = 20print("Issue 2 : ", member["Tony"])>>>>> 以下為程序輸出結果 >>>>>Issue 1 : 18Issue 2 : 20

更深入的 Python 查找

在實際應用中,我們可能會嘗試獲取一個不存在的鍵名,這是,Python 會報出 KeyError 的錯誤,我們可以通過 try - except 捕獲異常來處理,此外,我們也可以通過 in 來判斷鍵名是否存在。

# 查找# Issue 1 如果鍵名不存在與字典,會返回 KeyError 錯誤try: mike_member = member["Mike"]except KeyError: print("Issue 1 : Can not found member named: Mike")# Issue 2 可以用 in 查找鍵是否存在print("Issue 2 : Mike in member: ", "Mike" in member)print("Issue 2: Lilei in member: ", "Lilei" in member)>>>>> 以下為程序輸出結果 >>>>>Issue 1 : Can not found member named: MikeIssue 2 : Mike in member: FalseIssue 2: Lilei in member: True

字典作為簡單複合數據結構使用

通常,我們可以使用類和對象表示一個實體,但一些情況下,為了方便,我們也可以通過字典來表示一個實體,以下代碼演示通過字典來實驗一個簡單的 SNS 應用的一條消息實體,其中包含了用戶名、信息內容以及用戶標籤。

weixin = { "user": "Tony", "text": "Python is the best language", "hashtags": ["#python", "#java", "#data"]}# 獲取鍵名print("Key: ", weixin.keys())# 獲取鍵值print("Value: ", weixin.values())# 獲取 鍵-值 元組列表print("K-V tuple: ", weixin.items())>>>>> 以下為程序輸出結果 >>>>>Key: dict_keys([user, text, hashtags])Value: dict_values([Tony, Python is the best language, [#python, #java, #data]])K-V tuple: dict_items([(user, Tony), (text, Python is the best language), (hashtags, [#python, #java, #data])])

collection 包的兩個工具的使用

我們實際生產中,有很多情景需要統計個數,比如,統計一段文字裡面單詞出現個個數等,這時候,我們可以通過 Python 字典原生的功能實現,但通常情況下,使用 collections 包提供的 defaultdict 和 Counter 工具更為方便。

一下是一段來自維基百科對 Python 介紹的一段文字:

Python is a widely used high-level programming language for general-purpose programming, created by Guido van Rossum and first released in 1991. An interpreted language, Python has a design philosophy that emphasizes code readability notably using whitespace indentation to delimit code blocks rather than curly brackets or keywords, and a syntax that allows programmers to express concepts in fewer lines of code than might be used in languages such as C++ or Java.The language provides constructs intended to enable writing clear programs on both a small and large scale.

我們的目標是統計這一段文字裡面,不同單詞的出現個數。首先,我們需要先對這段文字進行一些處理,先清除標點符號,以及去除空格,將單詞存放到一個字典里。

# 字典計數raw_document = "Python is a widely used high-level programming language for general-purpose programming, created by Guido van Rossum and first released in 1991. An interpreted language, Python has a design philosophy that emphasizes code readability notably using whitespace indentation to delimit code blocks rather than curly brackets or keywords, and a syntax that allows programmers to express concepts in fewer lines of code than might be used in languages such as C++ or Java.The language provides constructs intended to enable writing clear programs on both a small and large scale."# 去標點符號non_punctuation_document = raw_document.replace(",", "").replace(".", "")document = non_punctuation_document.split(" ")

接下來,我們嘗試使用 Python 字典原生的方法來統計個數

# Issue 1, 使用字典原生方法統計個數word_counts = {}for word in document: previous_count = word_counts.get(word, 0) word_counts[word] = previous_count + 1print("Issue 1, count the words in document: ", word_counts)>>>>> 以下為程序輸出結果 >>>>>Issue 1, count the words in document: {Python: 2, is: 1, a: 4, widely: 1, used: 2, high-level: 1, programming: 2, language: 3, for: 1, general-purpose: 1, created: 1, by: 1, Guido: 1, van: 1, Rossum: 1, and: 3, first: 1, released: 1, in: 3, 1991: 1, An: 1, interpreted: 1, has: 1, design: 1, philosophy: 1, that: 2, emphasizes: 1, code: 3, readability: 1, notably: 1, using: 1, whitespace: 1, indentation: 1, to: 3, delimit: 1, blocks: 1, rather: 1, than: 2, curly: 1, brackets: 1, or: 2, keywords: 1, syntax: 1, allows: 1, programmers: 1, express: 1, concepts: 1, fewer: 1, lines: 1, of: 1, might: 1, be: 1, languages: 1, such: 1, as: 1, C++: 1, JavaThe: 1, provides: 1, constructs: 1, intended: 1, enable: 1, writing: 1, clear: 1, programs: 1, on: 1, both: 1, small: 1, large: 1, scale: 1}

使用 collections 的 dafaultdict 來統計單詞出現個數

dafaultdict 相當於一個標準的字典,除了當前查找一個沒有包含在內的鍵時,它會通過提供的零參數函數自動建立一個新鍵,並為它的值增加 1,使用 dafaultdict 的方法如下:

# Issue 2, 使用 defaultdict 統計詞個數from collections import defaultdictword_counts = defaultdict(int)for word in document: word_counts[word] += 1print("Issue 2, count the words in document by defaultdict: ", word_counts)>>>>> 以下為程序輸出結果 >>>>>Issue 2, count the words in document by defaultdict: defaultdict(<class int>, {Python: 2, is: 1, a: 4, widely: 1, used: 2, high-level: 1, programming: 2, language: 3, for: 1, general-purpose: 1, created: 1, by: 1, Guido: 1, van: 1, Rossum: 1, and: 3, first: 1, released: 1, in: 3, 1991: 1, An: 1, interpreted: 1, has: 1, design: 1, philosophy: 1, that: 2, emphasizes: 1, code: 3, readability: 1, notably: 1, using: 1, whitespace: 1, indentation: 1, to: 3, delimit: 1, blocks: 1, rather: 1, than: 2, curly: 1, brackets: 1, or: 2, keywords: 1, syntax: 1, allows: 1, programmers: 1, express: 1, concepts: 1, fewer: 1, lines: 1, of: 1, might: 1, be: 1, languages: 1, such: 1, as: 1, C++: 1, JavaThe: 1, provides: 1, constructs: 1, intended: 1, enable: 1, writing: 1, clear: 1, programs: 1, on: 1, both: 1, small: 1, large: 1, scale: 1})

我們可以看到,使用 defaultdict 代碼量會比直接使用字典簡單,而且輸出的結果是一樣的。

使用 collections 的 Counter 來統計單詞數目

除了統計單詞數目外,我們在實際中可能更需要經過篩選處理的結果,這裡我們使用 Counter 可以列出單詞出現個數排名前十的單詞及其出現的次數,具體代碼如下:

# Issue 3, 使用 Counter 統計詞個數from collections import Counterword_counts = Counter(document)for word, count in word_counts.most_common(10): print("Issue 3, most common word in documents: ", word, count)>>>>> 以下為程序輸出結果 >>>>>Issue 3, most common word in documents: a 4Issue 3, most common word in documents: language 3Issue 3, most common word in documents: and 3Issue 3, most common word in documents: in 3Issue 3, most common word in documents: code 3Issue 3, most common word in documents: to 3Issue 3, most common word in documents: Python 2Issue 3, most common word in documents: used 2Issue 3, most common word in documents: programming 2Issue 3, most common word in documents: that 2

總結

通過這篇文章,我們回顧了 Python 字典的基本用法,之後通過一個簡單的實例,嘗試了使用 collections 提供的 defaultdict 以及 Counter 包,了解如何通過字典來統計數目。

參考資料

[1] Joel Grus. 數據科學入門(第2章 Python速成) [978-7-115-41741-1].人民郵電出版社


推薦閱讀:

我用Hexo寫博客
python高效編程實踐-如何在列表,字典,集合中根據條件篩選數據(1/50)
一道入群驗證的Python題

TAG:Python |