學過python,並且希望從事數據分析的都應該知道,pandas是一個經常用到的庫,所以對這個庫的學習就比較重要的。
找了很久終於找到了一個比較合適的練習題,這個練習題是Github上面的,地址在這
有興趣的可以去試著做一下,就像作者說的,to learn is to do. So unless you practice you wont learn.
總共有11個練習章節,由淺入深,剛好可以通過這個練習來檢驗一下自己的pandas水平
歐克,讓我們開始吧
第一個,Getting&Knowing Your Data ----理解數據之Chipo
由於這個章節的題都很基礎,我就一帶而過,不具體展開了
In [ ]:#第一題很簡單 import pandas as pd import numpy as np
Step 2. Import the dataset from this address. Step 3. Assign it to a variable called chipo. In [ ]: #這兩題也是很簡單的,從給出的地址導入數據
url = https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv chipo = pd.read_csv(url, sep = )
第四題 Step 4. See the first 10 entries chipo.head(10)
#output
第五題 #這個題也很簡單,讓求這個數據集的數據量,要求給出兩種方法 Step 5. What is the number of observations in the dataset? In [1]:chipo.shape[0] # Solution 1 In [2]:chipo.info() # Solution 2
#上面兩個語句都是可以求得數據量的
#這個題和第五題考的內容基本一致 chipo.shape[1]
chipo.columns
In [ ]:#考數據集的索引的 chipo.index
In [ ]: #這個是考訂購最多的item c = chipo.groupby(item_name) c.max().head(1) Step 10. For the most-ordered item, how many items were ordered? In [ ]:#讓在第九題的基礎上,求出有多少item被訂購了 c = chipo.groupby(item_name) c = c.sum() c = c.sort_values([quantity], ascending=False) c.head(1)
第11題
In [ ]: c = chipo.groupby(choice_description).sum() c = c.sort_values([quantity], ascending=False) c.head(1)
In [ ]: #求和 total_items_ordered = chipo[quantity].sum() total_items_ordered
#首先看一下price這一列是什麼類型的數據,
In [ ]:chipo[item_price].dtype
In [ ]:#原題讓使用一個lambda函數,來轉換數據類型 func = lambda x: float(x[1:-1])#我這裡使用了[1:-1]代表了全部的數據, chipo.item_price = chipo.item_price.apply(func)
In [ ]: chipo.item_price.dtype
#其實這裡還有另外一種解法
chipo.item_price = pd.to_numeric(chipo.item_price, downcast=float) chipo.item_price.dtype #引用to_numeric這個函數,可以直接對整列數據進行轉換。
In [ ]:#revenue是收益的意思 #revenue=quantity*price revenue = (chipo[quantity] * chipo[item_price]).sum() print(Revenue was :$,str(revenue))
In [ ]: orders = chipo.order_id.value_counts().count() orders
In [3]: # Solution 1 #首先要求出總的revenue revenue = (chipo[quantity] * chipo[item_price]).sum() #和總的orders orders = chipo.order_id.value_counts().count() avg = revenue/orders avg In [4]: # Solution 2 #可以直接使用groupby函數 chipo.groupby(by=[order_id]).sum().mean()[revenue] #先按照id分組,然後求和在求平均,然後取出revenue列
Step 17. How many different items are sold?
In [ ]:
chipo.item_name.value_counts().count()
到這裡第一章節的第一部分,就結束了
繼續下一部分
Occupation
Step 2. Import the dataset from this address.
url = (https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user) users = pd.read_csv(url,sep=|)
In [ ]:url = (https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user) users = pd.read_csv(url,sep=|,index_col=user_id)
In [ ]:users.head(25)
In [ ]: #tail是從後往前顯示的 users.tail(10)
In [ ]:users.shape[0]
In [ ]: users.shape[1]
In [ ]:print(users.columns)
In [ ]:users.index
In [ ]: users.dtypes
In [ ]:users.occupation #or users[occupation]
In [ ]:users.occupation.nunique()
In [ ]: users.occupation.value_counts().head(1)
In [ ]: users.describe()
In [ ]: #因為describe默認的是只對數字型起作用 users.describe(include=all)
In [ ]: users.occupation.describe()
In [ ]: users.age.mean()
users.age.value_counts().tail()
小總結:基礎的了解數據集的過程,是很程序化的,基本就是這上面提到的一些語句,head,tail,count,describe,mean,index,columns,info,nuique等等吧,主要使用這些語句來一窺數據集,了解數據集,獲取數據集。
下面開始第二部分,分組和聚合數據
Filtering and Sorting Data
This time we are going to pull data directly from the internet. Special thanks to: https://github.com/justmarkham for sharing the dataset and materials. Step 1. Import the necessary libraries In [ ]:
Step 3. Assign it to a variable called chipo. In [ ]: 這裡之前已經做過,不在重複
? In [ ]:
In [ ]: chipo.item_name.sort_values() #or chipo.sort_values(by=item_name)[item_name]
In [ ]: chipo.sort_values(by = "item_price", ascending = False).head(1)
In [ ]:chipo_salad = chipo[chipo.item_name == "Veggie Salad Bowl"]
len(chipo_salad)
chipo_drink_steak_bowl = chipo[(chipo.item_name == "Canned Soda") & (chipo.quantity > 1)] len(chipo_drink_steak_bowl)
敲了一天的代碼,就到此為止吧!
如果你覺得這篇文章還不錯,希望能給我一個贊,雖然我寫這個專欄的目的是希望自己能夠通過寫專欄的形式,倒逼自己去每天學習,但是你們的一個贊將給與我莫大的支持。謝謝!
當然有想轉行的小夥伴,也可以加我的微信ggy_0808,和我交流,我相信幾個人一起前行能走的更遠。
TAG:Pandas(Python) | Python入門 | 數據分析 |