利用requests爬取表情包
上一次我用urllib爬取了頭像吧的頭像,本來以為換個網址就能夠爬取其他吧的圖片,結果我還是太天真。後來想想,算了那這次我就使用requests來爬取表情包吧,對於這個庫我真的是相見恨晚啊,代碼如下只要稍微修改一下你的保存路徑就可以用
import urllib.requestimport urllib,reimport urllib.errorclass Get_Touxiang(): def __init__(self): self.n = 297 headers = [("User-Agent","Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36")] opener = urllib.request.build_opener() opener.addheaders = headers urllib.request.install_opener(opener) #獲取帖子鏈接 def get_urls(self,url): try: self.data1 = urllib.request.urlopen(url).read().decode(utf-8) self.pat1 = <a rel="noreferrer" href="(/.*?)" self.pat1 = re.compile(self.pat1) self.urls = self.pat1.findall(self.data1) for i in range(len(self.urls)): self.urls[i] = "http://tieba.baidu.com"+ self.urls[i] return self.urls except urllib.error.HTTPError as e: print(e.reason) #獲取圖片鏈接 def get_pictures(self,url): self.data2 = urllib.request.urlopen(url).read().decode("utf-8") self.pat2 = max-page="(d*?)" maxpage = int(re.compile(self.pat2).findall(self.data2)[0]) for i in range(1,maxpage+1): picurl = url + ?pn=+str(i) data = urllib.request.urlopen(picurl).read().decode("utf-8") pat3 = <img class="BDE_Image".*? src="(https://imgsa.baidu.com.*?.jpg)" yield re.compile(pat3,re.S).findall(data) #保存圖片 def save_pictures(self,picurls): for picurl in picurls: print(picurl) #在下面的空格裡面填寫你的保存路徑 path = " "+str(self.n)+".jpg" urllib.request.urlretrieve(picurl,path) urllib.request.urlcleanup() self.n += 1getit = Get_Touxiang()#定義你想獲取的頁數wantpages = 10for i in range(1,wantpages): urlshou = "http://tieba.baidu.com/f?ie=utf-8&kw="+urllib.request.quote("表情包")+"&pn="+str(50*i) urlties = getit.get_urls(urlshou) for url in urlties: for page_pictures in getit.get_pictures(url): getit.save_pictures(page_pictures)
推薦閱讀:
※Python 在 Linux 系統運維中都有哪些應用?
※跟黃哥學python之「為啥類第一個參數是實例對象」
※PyQt5番外篇(2-3):沖頂大會語音答題輔助小工具之解析篇——問題採集
※現在做python web開發一個小項目,無從下手,望各位老師和同學給引導一下,提提意見,謝謝!?
※自動處理excel數據,用什麼語言合適?