Pandas 練習

學過python,並且希望從事數據分析的都應該知道,pandas是一個經常用到的庫,所以對這個庫的學習就比較重要的。

找了很久終於找到了一個比較合適的練習題,這個練習題是Github上面的,地址在這

guipsamora/pandas_exercises?

github.com圖標

有興趣的可以去試著做一下,就像作者說的,to learn is to do. So unless you practice you wont learn.

整體的結構很清晰

總共有11個練習章節,由淺入深,剛好可以通過這個練習來檢驗一下自己的pandas水平

歐克,讓我們開始吧

第一個,Getting&Knowing Your Data ----理解數據之Chipo

由於這個章節的題都很基礎,我就一帶而過,不具體展開了

Ex2 - Getting and Knowing your Data

This time we are going to pull data directly from the internet. Special thanks to: github.com/justmarkham for sharing the dataset and materials.

Step 1. Import the necessary libraries

In [ ]:#第一題很簡單
import pandas as pd
import numpy as np

Step 2. Import the dataset from this address.
Step 3. Assign it to a variable called chipo.
In [ ]:
#這兩題也是很簡單的,從給出的地址導入數據

url = https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv
chipo = pd.read_csv(url, sep = )

第四題
Step 4. See the first 10 entries
chipo.head(10)

#output

顯示chipo這個數據集的前10行數據

第五題
#這個題也很簡單,讓求這個數據集的數據量,要求給出兩種方法
Step 5. What is the number of observations in the dataset?
In [1]:chipo.shape[0]
# Solution 1
In [2]:chipo.info()
# Solution 2

#上面兩個語句都是可以求得數據量的

Step 6. What is the number of columns in the dataset?

#這個題和第五題考的內容基本一致
chipo.shape[1]

Step 7. Print the name of all the columns.

chipo.columns

Step 8. How is the dataset indexed?

In [ ]:#考數據集的索引的
chipo.index

Step 9. Which was the most-ordered item?

In [ ]:
#這個是考訂購最多的item
c = chipo.groupby(item_name)
c.max().head(1)
Step 10. For the most-ordered item, how many items were ordered?
In [ ]:#讓在第九題的基礎上,求出有多少item被訂購了
c = chipo.groupby(item_name)
c = c.sum()
c = c.sort_values([quantity], ascending=False)
c.head(1)

第11題

Step 11. What was the most ordered item in the choice_description column?

In [ ]:
c = chipo.groupby(choice_description).sum()
c = c.sort_values([quantity], ascending=False)
c.head(1)

Step 12. How many items were orderd in total?

In [ ]:
#求和
total_items_ordered = chipo[quantity].sum()
total_items_ordered

Step 13. Turn the item price into a float

#首先看一下price這一列是什麼類型的數據,

Step 13.a. Check the item price type

In [ ]:chipo[item_price].dtype

Step 13.b. Create a lambda function and change the type of item price

In [ ]:#原題讓使用一個lambda函數,來轉換數據類型
func = lambda x: float(x[1:-1])#我這裡使用了[1:-1]代表了全部的數據,
chipo.item_price = chipo.item_price.apply(func)

Step 13.c. Check the item price type

In [ ]:
chipo.item_price.dtype

#其實這裡還有另外一種解法

chipo.item_price = pd.to_numeric(chipo.item_price,
downcast=float)
chipo.item_price.dtype
#引用to_numeric這個函數,可以直接對整列數據進行轉換。

Step 14. How much was the revenue for the period in the dataset?

In [ ]:#revenue是收益的意思
#revenue=quantity*price
revenue = (chipo[quantity] * chipo[item_price]).sum()
print(Revenue was :$,str(revenue))

Step 15. How many orders were made in the period?

In [ ]:
orders = chipo.order_id.value_counts().count()
orders

Step 16. What is the average revenue amount per order?

In [3]:
# Solution 1
#首先要求出總的revenue
revenue = (chipo[quantity] * chipo[item_price]).sum()
#和總的orders
orders = chipo.order_id.value_counts().count()
avg = revenue/orders
avg
In [4]:
# Solution 2
#可以直接使用groupby函數
chipo.groupby(by=[order_id]).sum().mean()[revenue]
#先按照id分組,然後求和在求平均,然後取出revenue列

Step 17. How many different items are sold?

In [ ]:

chipo.item_name.value_counts().count()

到這裡第一章節的第一部分,就結束了

繼續下一部分

Occupation

Ex3 - Getting and Knowing your Data

This time we are going to pull data directly from the internet. Special thanks to: github.com/justmarkham for sharing the dataset and materials.

Step 1. Import the necessary libraries

In [ ]:

Step 2. Import the dataset from this address.

url = (https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user)
users = pd.read_csv(url,sep=|)

Step 3. Assign it to a variable called users and use the user_id as index

In [ ]:url = (https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user)
users = pd.read_csv(url,sep=|,index_col=user_id)

Step 4. See the first 25 entries

In [ ]:users.head(25)

Step 5. See the last 10 entries

In [ ]:
#tail是從後往前顯示的
users.tail(10)

Step 6. What is the number of observations in the dataset?

In [ ]:users.shape[0]

Step 7. What is the number of columns in the dataset?

In [ ]:
users.shape[1]

Step 8. Print the name of all the columns.

In [ ]:print(users.columns)

Step 9. How is the dataset indexed?

In [ ]:users.index

Step 10. What is the data type of each column?

In [ ]:
users.dtypes

Step 11. Print only the occupation column

In [ ]:users.occupation
#or
users[occupation]

Step 12. How many different occupations there are in this dataset?

In [ ]:users.occupation.nunique()

Step 13. What is the most frequent occupation?

In [ ]:
users.occupation.value_counts().head(1)

Step 14. Summarize the DataFrame.

In [ ]:
users.describe()

Step 15. Summarize all the columns

In [ ]:
#因為describe默認的是只對數字型起作用
users.describe(include=all)

Step 16. Summarize only the occupation column

In [ ]: users.occupation.describe()

Step 17. What is the mean age of users?

In [ ]:
users.age.mean()

Step 18. What is the age with least occurrence?

In [ ]:

users.age.value_counts().tail()

小總結:基礎的了解數據集的過程,是很程序化的,基本就是這上面提到的一些語句,head,tail,count,describe,mean,index,columns,info,nuique等等吧,主要使用這些語句來一窺數據集,了解數據集,獲取數據集。

下面開始第二部分,分組和聚合數據

Filtering and Sorting Data

This time we are going to pull data directly from the internet. Special thanks to: github.com/justmarkham for sharing the dataset and materials.

Ex1 - Filtering and Sorting Data

This time we are going to pull data directly from the internet. Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.
Step 1. Import the necessary libraries
In [ ]:

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called chipo.
In [ ]:
這裡之前已經做過,不在重複

?
In [ ]:

Step 5. Sort by the name of the item

In [ ]:
chipo.item_name.sort_values()
#or
chipo.sort_values(by=item_name)[item_name]

Step 6. What was the quantity of the most expensive item ordered?

In [ ]:
chipo.sort_values(by = "item_price", ascending = False).head(1)

Step 7. How many times were a Veggie Salad Bowl ordered?

In [ ]:chipo_salad = chipo[chipo.item_name == "Veggie Salad Bowl"]

len(chipo_salad)

Step 8. How many times people orderd more than one Canned Soda?

In [ ]:

chipo_drink_steak_bowl = chipo[(chipo.item_name == "Canned Soda") & (chipo.quantity > 1)]
len(chipo_drink_steak_bowl)

敲了一天的代碼,就到此為止吧!

如果你覺得這篇文章還不錯,希望能給我一個贊,雖然我寫這個專欄的目的是希望自己能夠通過寫專欄的形式,倒逼自己去每天學習,但是你們的一個贊將給與我莫大的支持。謝謝!

當然有想轉行的小夥伴,也可以加我的微信ggy_0808,和我交流,我相信幾個人一起前行能走的更遠。


推薦閱讀:

TAG:Pandas(Python) | Python入門 | 數據分析 |