R語言爬蟲可視化——用數據來聊聊2017年首周各大城市空氣質量
最近學了些revst包的基礎知識,勉強能爬到一些有用的數據,剛好趁著周末,捂著臉跟大家分享。
這一篇使用revst包爬取了中國環保部環境監測中心公布367個主要城市的日度AQI指數信息(2017年1~7日),由於個別城市數據有缺失,可視化過程可能會遺漏部分城市信息。
以下是本篇需要載入的環境包:
library(rvest)library(stringr)library(dplyr)library(ggplot2)library(plyr)library(maptools)library(ggmap)library(Hmisc)library(leafletCN)library(ggthemes)
使用revst包爬取了2017年1日至7日的367各主要城市AQI指數數據:
url<-"http://datacenter.mep.gov.cn/report/air_daily/air_dairy.jsp?city=&startdate=2017-01-01&enddate=2017-01-07&page="final <- data.frame()for (m in 1:86){fun<-function(m){url<-paste(url,m,sep="")web<-read_html(url,encoding="UTF-8")Num<-web %>% html_nodes("tr>td:nth-child(1)") %>% html_text()City<-web %>% html_nodes("tr>td:nth-child(2)") %>% html_text()Date<-web %>% html_nodes("tr>td:nth-child(3)") %>% html_text()AQI<-web %>% html_nodes("tr>td:nth-child(4)") %>% html_text()Level<-web %>% html_nodes("tr>td:nth-child(5)") %>% html_text()Mainpo<-web %>% html_nodes("tr>td:nth-child(6)") %>% html_text()final<-data.frame(Num=Num[6:35],City=City[6:35],Date=Date[4:33],AQI=AQI[4:33],Level=Level[3:32],Mainpo=Mainpo[2:31],stringsAsFactors =FALSE)}final<-rbind(final,fun(m))}
數據預處理:
final<-final[1:2569,]final$AQI<-as.numeric(final$AQI)final$Level<-factor(final$Level,levels=c("重度污染","嚴重污染","輕度污染","中度污染","良","優"),order=TRUE)
address<-unique(final$City)add<-get_geo_position(address)final1<-merge(final,add, by.x = "City", by.y = "city",all.x=TRUE)final1$day<-substr(final1$Date,10,10)names(final1)final1<-final1[,c("City","Num","Date","day","AQI","Level","Mainpo","lon","lat")]newdata1<-final1[,c("City","lon","lat","day","AQI","Level","Mainpo")]
地圖素材導入:
china_map<-readShapePoly("c:/rstudy/bou2_4p.shp")x <- china_map@data xs <- data.frame(id=row.names(x),x) china_map1 <- fortify(china_map) china_map_data <- join(china_map1, xs, type = "full") mydata <- read.csv("c:/rstudy/geshengzhibiao.csv")china_data <- join(china_map_data, mydata, type="full")
首先查看下所選取的367個主要城市在全國的分布情況:
ggplot()+ geom_polygon(data=china_data,aes(x=long,y=lat,group=group),fill="white",colour="grey60")+ geom_point(data=newdata1,aes(x=lon,y=lat),colour="red")+ coord_map("polyconic") + theme_nothing()
用氣泡圖展示主要城市AQI指數相對高低(氣泡圖大小及顏色深淺均表示AQI指數強弱)
(以下數據基於2017年1日~7日367個城市的平均AQI指數數據)
newdata2<-newdata1[,c("City","day","AQI")];newdata2$day<-as.factor(newdata2$day)newdata2<-tapply(newdata2$AQI,list(newdata2$City),mean,na.rm=TRUE)newdata2<-as.data.frame(newdata2)newdata2$Address<-rownames(newdata2)names(newdata2)<-c("AQIM","Address");newdata2<-newdata2[,c("Address","AQIM")]newdata2<-na.omit(newdata2)mynewdata<-merge(newdata2,add, by.x = "Address", by.y = "city",all.x=TRUE)
ggplot()+ geom_polygon(data=china_data,aes(x=long,y=lat,group=group),fill="white",colour="grey60")+ geom_point(data=mynewdata,aes(x=lon,y=lat,size=AQIM,fill=AQIM),shape=21,colour="black")+ scale_size_area(max_size=5)+ scale_fill_gradient(low="white",high="#D73434")+ coord_map("polyconic") + theme_nothing()
使用中心輻射熱度圖及散點圖疊加可以在宏觀上洞察全國各省各地區的空氣質量級別及集中分布趨勢:
ggplot()+geom_polygon(data=china_map,aes(x=long,y=lat,group=group),fill="#005A32",col="white")+geom_polygon(data=mynewdata,aes(x=lon,y=lat,fill = ..level..), stat="density_2d", alpha = .5, color = NA)+coord_map("polyconic") +geom_point(data=mynewdata,aes(x=lon,y=lat),col="red",size=1)+scale_fill_gradient2( low = "white",mid="yellow", high = "red")+theme_nothing()
使用熱力地圖查看整體城市空氣質量的地域分布特徵:
geojsonMap(mynewdata,"city",popup = paste0(mynewdata$Address,":",dat$AQIM),palette = "Reds", legendTitle = "AQI Index")
AQI指數最高的10個城市:
mynewdata3<-newdata2[order(-newdata2$AQIM),][1:10,]ggplot(mynewdata3,aes(reorder(Address,AQIM),AQIM))+geom_bar(stat="identity",position="dodge",fill="#D6B869")+theme_wsj()+coord_flip()+scale_fill_wsj("rgby", "")+theme(axis.ticks.length=unit(0.5,"cm"))+geom_text(aes(label=round(AQIM+0.05,1)), position = position_dodge(0.9),hjust=1.1,colour="white",size=5)+guides(fill=guide_legend(title=NULL))+ggtitle("十大污染最嚴重城市")+theme( axis.title = element_blank(), legend.position="none", panel.grid.major.x=element_line(linetype="dashed",colour="grey60"), panel.grid.major.y=element_blank(), axis.ticks.x=element_blank(), axis.ticks.y=element_line(), axis.ticks.length=unit(0.3,"cm"), axis.line.x=element_blank(), axis.line.y=element_line(), axis.text.x=element_text(size=10), )
因為所收集的數據中,行政區劃名稱與現有地圖素材有出入,鑒於城市較多,匹配比較麻煩,暫時沒有製作基於空氣質量水平的離散填充地圖,但是方法之前已經多有介紹,感興趣的小夥伴兒可以藉此自己練習。
聯繫方式:
wechat:ljty1991
Mail:578708965@qq.com 個人公眾號:數據小魔方(datamofang)團隊公眾號:EasyCharts
qq交流群:[魔方學院]553270834推薦閱讀:
※R語言可視化——ggplot繪製中心密度輻射圖
※R語言可視化——數據地圖離散百分比填充(環渤海)
※R語言分析告訴你應避開哪個國家以躲避空難
※MySQL入門及其與R的交互
※R語言數據可視化——顏色綜合運用與色彩方案共享