標籤:

根據小甲魚60課編寫的爬取貼吧mm程序(python2.7)

# -*- coding: utf-8 -*-nimport urllib,urllib2nimport renndef url_open(url):n req = urllib2.Request(url)n req.add_header(User-Agent,Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36)nn# 有代理訪問 n# proxies = []n# proxy = random.choice(proxies)n# proxy_handler = urllib2.ProxyHandler({http: random.choice(iplist)})n# opener = urllib2.build_opener(proxy_handler)n# html = opener.open(url).read()nn# 無代理訪問 n response = urllib2.urlopen(req)n html = response.read()nn return htmlnndef get_img(html):n p = r<img class="BDE_Image" src="([^"]+.jpg)"n imglist = re.findall(p,html)nn for each in imglist:n filename = each.split("/")[-1]n urllib.urlretrieve(each,filename,None)n n nif __name__ == __main__:n url = https://tieba.baidu.com/p/5418369153n get_img(url_open(url))n

推薦閱讀:

python爬取廖雪峰教程存為PDF
根據小甲魚第56課編寫的爬取煎蛋網mm圖python2.7小程序
用python爬取絕地求生各區服top100玩家數據

TAG:python爬虫 |