探索星巴克店鋪—Kaggle數據分析實戰
很久沒更新了,這段時間工作上比較忙,學習了稍微複雜的數據分析,了解了dplyr和ggplot2兩個包的使用(尤其ggplot2),一直沒有抽出時間完成實戰,今天在Kaggle上下載了探索星巴克店的數據,模仿著進行實踐分析。
此數據集是關於星巴克和全球各地的子店的信息,包括截至2017年2月的每個星巴克或附屬商店位置的記錄。
數據源來自Kaggle: Starbucks Locations Worldwide | Kaggle
一、分析目的
可分析的緯度很多,我本著理解ggplot2包的目的,主要分析了世界各地有多少星巴克店?哪些城市和國家的星巴克店數量最多?星巴克實際擁有和經營的店鋪量?
二、數據預處理
導入數據:採用了把.csv文件變換成.xlsx導入的,轉換的時候,先解決了中文亂碼的問題。數據集中的韓文、越南文等在.xlsx中顯示的都是正常的。
遇到問題:嘗試過通過R直接讀取.csv文件,也解決了中文亂碼的問題,但是數據集中有的韓文、越南文等在R總顯示異常,沒有找到解決的辦法!
> library(openxlsx)n> readFilePath <- "C:/Users/shuer/Desktop/Starbucks.xlsx" n> starbucks <- read.xlsx(readFilePath, "Sheet1")n
三、理解數據
導入數據後,先查看下數據結構,主要包括以下幾個變數:
> str(starbucks)ndata.frame:t25601 obs. of 14 variables:n $ Brand : chr "Starbucks" "Starbucks" "Starbucks" "Starbucks" ...n $ Store.Number : chr "47370-257954" "22331-212325" "47089-256771" "22126-218024" ...n $ Store.Name : chr "Meritxell, 96" "Ajman Drive Thru" "Dana Mall" "Twofour 54" ...n $ Ownership.Type: chr "Licensed" "Licensed" "Licensed" "Licensed" ...n $ Street.Address: chr "Av. Meritxell, 96" "1 Street 69, Al Jarf" "Sheikh Khalifa Bin Zayed St." "Al Salam Street" ...n $ City : chr "Andorra la Vella" "Ajman" "Ajman" "Abu Dhabi" ...n $ State/Province: chr "7" "AJ" "AJ" "AZ" ...n $ Country : chr "AD" "AE" "AE" "AE" ...n $ Postcode : chr "AD500" NA NA NA ...n $ Phone.Number : chr "376818720" NA NA NA ...n $ Timezone : chr "GMT+1:00 Europe/Andorra" "GMT+04:00 Asia/Dubai" "GMT+04:00 Asia/Dubai" "GMT+04:00 Asia/Dubai" ...n $ Longitude : chr "1.53" "55.47" "55.47" "54.38" ...n $ Latitude : num 42.5 25.4 25.4 24.5 24.5 ...n $ X14 : num NA NA NA NA NA NA NA NA NA NA ...n
Brand:品牌(Starbucks)
City:城市(星巴克店所在城市)
Country:國家(星巴克店所在國家)
State/Province:州/省份
Ownership Type:所有權類型(Company Owned、Franchise、Joint Venture、Licensed)
Timezone:時區
#更改變數名nnames(starbucks)[7] <- "state.province"n
四、數據計算
1、世界各地有多少個星巴克店?
> dim(starbucks)[1]n[1] 25600n
2、有多少國家有星巴克店?
> length(unique(starbucks$Country))n[1] 73n
3、有多少城市有星巴克?
> length(unique(starbucks$City))n[1] 5471n
4、擁有星巴克店最多的10個國家?
# 星巴克店鋪最多的10個國家nby_country <- starbucks %>% group_by(Country) %>% summarise(Total = n(), Percentage = round(Total/dim(starbucks)[1] * 100, 2)) %>% arrange(desc(Total)) %>% head(10)nby_country$CountryName <- countrycode_data[match(by_country$Country, countrycode_data$iso2c), "country.name"]ndatatable(by_country)nhchart(by_country, "treemap", hcaes(x = Country, value = Total, color = Total )) %>%n hc_colorAxis(stops = color_stops(n = 10, colors = c("#440154", "#21908C", "#FDE725"))) %>%n hc_add_theme(hc_theme_google()) %>%n hc_title(text = "Top 10 Countries with Most Starbuck stores") %>%n hc_credits(enabled = TRUE, text = "Sources: Starbucks Store Locator data by Github user chrismeller", style = list(fontSize = "10px")) %>%n hc_legend(enabled = TRUE)n
5、擁有星巴克店最多的10個城市?
#星巴克店鋪最多的10個城市nby_city <- starbucks %>% group_by(City) %>% summarise(Total = n(), Percentage = round(Total/dim(starbucks)[1] * 100, 2)) %>% arrange(desc(Total)) %>% head(10)ndatatable(by_city)nhchart(by_city, "treemap", hcaes(x = City, value = Total, color = Total )) %>%n hc_colorAxis(stops = color_stops(n = 10, colors = c("#440154", "#21908C", "#FDE725"))) %>%n hc_add_theme(hc_theme_google()) %>%n hc_title(text = "Top 10 Cities with Most Starbuck stores") %>%n hc_credits(enabled = TRUE, text = "Sources: Starbucks Store Locator data by Github user chrismeller", style = list(fontSize = "10px")) %>%n hc_legend(enabled = TRUE)n
6、星巴克實際擁有和經營的商店有多少?
options(repr.plot.width_=7, repr.plot.height=5) ncountry_own<-starbucks %>% filter(Brand=="Starbucks") %>% group_by(Ownership.Type) %>% summarize(Count= n())ndatatable(country_own)nggplot(country_own, aes(x=Ownership.Type, y=Count ,fill=Ownership.Type))+ngeom_bar(stat="identity",position="Dodge") + n theme(axis.text.x = element_text(angle=90,hjust=1, vjust=1),legend.position="bottom") +n labs(title="Ownership based Starbucks")+scale_fill_manual(values=col_new)n
一點想法:以上是參考著別人寫的代碼,通過了解ggplot2這個包,模仿著弄出來的分析和圖形。感覺利用ggplot2畫出的圖形夠漂亮,在想後面可以先深入的學習下ggplot2,更有利於提高學習數據分析的興趣和功用性。時間夠緊,要努力了!
推薦閱讀:
※星巴克的初戀.熱戀.失戀三種飲料都是如何做的?
※從專業角度來看星巴克的咖啡烘培
※半夜想吃抹茶星冰樂怎麼辦?
※星巴克的咖啡在我國為什麼這麼貴!?
※星巴克奇葩(自創)咖啡搭配有哪些?