Python數據科學(三)- python與數據科學應用(Ⅲ)

傳送門:

Python數據科學(一)- python與數據科學應用(Ⅰ)

Python數據科學(二)- python與數據科學應用(Ⅱ)

Python數據科學(三)- python與數據科學應用(Ⅲ)

Python數據科學(四)- 數據收集系列

Python數據科學(五)- 數據處理和數據採集

Python數據科學(六)- 資料清理(Ⅰ)

Python數據科學(七)- 資料清理(Ⅱ)

1.使用Python計算文章中的字

speech_text = I love you,Not for what you are,But for what I amWhen I am with you.I love you,Not only for whatYou have made of yourself,But for whatYou are making of me.I love youFor the part of meThat you bring out;I love youFor putting your handInto my heaped-up heartAnd passing overAll the foolish, weak thingsThat you can』t helpDimly seeing there,And for drawing outInto the lightAll the beautiful belongingsThat no one else had lookedQuite far enough to find.I love you because youAre helping me to makeOf the lumber of my lifeNot a tavernBut a temple;Out of the worksOf my every dayNot a reproachBut a song.I love youBecause you have doneMore than any creedCould have doneTo make me goodAnd more than any fateCould have doneTo make me happy.You have done itWithout a touch,Without a word,Without a sign.You have done itBy being yourself.Perhaps that is whatBeing a friend means,After all.speech = speech_text.split()dic = {}for word in speech: if word not in dic: dic[word]=1 else: dic[word]=dic[word] + 1dic.items()

在使用nltk的時候,發現一直報錯,可以使用下邊兩行命令安裝nltk

import nltknltk.download()

會彈出以下窗口,下載nltk.

正在下載

如果這種方式下載完成了 那就直接跳過下一步

我下了很多次最後都下載失敗了,現在說第二種方法。

直接下載打包好的安裝包:下載地址1:雲盤密碼znx7,下來的包nltk_data.zip 解壓到C盤根目錄下,這樣是最保險的,防止找不到包。下載地址2:雲盤密碼4cp3

感謝【V_can--Python與自然語言處理_第一期_NLTK入門之環境搭建提供的安裝包】

去除停用詞

2.使用第二種方法直接使用python中的第三方庫Counter

#代碼如下from collections import Counterc = Counter(speech)c. most_common(10)#出現的前十名print(c. most_common(10))for sw in stop_words: del c[sw]c.most_common(10)

Counter 是實現的 dict 的一個子類,可以用來方便地計數。

  • 附上完整代碼

speech_text = I love you,Not for what you are,But for what I amWhen I am with you.I love you,Not only for whatYou have made of yourself,But for whatYou are making of me.I love youFor the part of meThat you bring out;I love youFor putting your handInto my heaped-up heartAnd passing overAll the foolish, weak thingsThat you can』t helpDimly seeing there,And for drawing outInto the lightAll the beautiful belongingsThat no one else had lookedQuite far enough to find.I love you because youAre helping me to makeOf the lumber of my lifeNot a tavernBut a temple;Out of the worksOf my every dayNot a reproachBut a song.I love youBecause you have doneMore than any creedCould have doneTo make me goodAnd more than any fateCould have doneTo make me happy.You have done itWithout a touch,Without a word,Without a sign.You have done itBy being yourself.Perhaps that is whatBeing a friend means,After all.#解決大小寫的問題speech = speech_text.lower().split()print(speech)dic = {}for word in speech: if word not in dic: dic[word] = 1 else: dic[word] = dic[word] + 1import operatorswd = sorted(dic.items(),key=operator.itemgetter(1),reverse=True)print(swd)#停用詞處理from nltk.corpus import stopwordsstop_words = stopwords.words(English)for k,v in swd: if k not in stop_words: print(k,v)from collections import Counterc = Counter(speech)c. most_common(10)#出現的前十名print(c. most_common(10))for sw in stop_words: del c[sw]c.most_common(10)

通過這兩種方法我們就不難明白為什麼現在Python 在數據分析、科學計算領域用得越來越多,除了語言本身的特點,第三方庫也很多很好用。

人生幾何,何不python當歌?

作者:許勝利 Python愛好者社區專欄作者,請勿轉載,謝謝。

博客專欄:許勝利的博客專欄

配套視頻教程:Python3爬蟲三大案例實戰分享 公眾號:Python愛好者社區(微信ID:python_shequ),關注,查看更多連載內容。

推薦閱讀:

Python安全工具開發(一) :分散式爬蟲初探
3分鐘帶你了解SQL高級操作
有沒有什麼東西是 Go 可以做但 Python 做不到的?
[學習Scrapy 1]如何新建一個Scrapy項目?
速成班出來的AI人才,到底怎麼樣?

TAG:Python | 數據科學 |