開啟知乎收藏夾看圖模式

01-26

代碼放在了這裡，python3的～

wzyonggege/Zhihu-Crawler

---------------------------------------------------------------------------------------

我們在逛知乎的時候，經常會遇到下面這種收藏夾，嗯，你懂的～

我也逛～

於是就想寫一個python小程序，來開啟「只看圖」模式呢～

就像這樣～

簡單的抓了一下～

程序代碼比較簡單，也比較溫和，首先是模擬cookie登錄知乎，收藏夾頁面訪問一次，獲取十個回答的鏈接，每個回答的鏈接訪問一個，獲取頁面下圖片的URL，然後寫入本地～

這裡模擬登錄不多做介紹，只說一種比較簡單的方法:瀏覽器登錄知乎後，打開開發者模式，找到主頁面，

Network->Headers->Requests Headers->Cookie，把這一整段複製下來，用來模擬登錄。

Cookie拷貝下來後，使用requests訪問

（本文代碼python版本2.7）

import requestsheaders = { User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36, Cookie: cookie #你的cookie}url = https://www.zhihu.com/collection/69135664response = requests.get(url, headers=headers).content

如上便可簡單的實現模擬登錄，

接著就是比較簡單的分頁和頁面標籤提取，可自己研究一下（本文需求不需要調用知乎API去解析動態資源如json），我這裡就拋磚一下。

# coding:utf-8import requestsfrom lxml import htmlimport os# 編碼問題，可以加下面三行# import sys# reload(sys)# sys.setdefaultencoding(utf-8)headers = { User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36, Cookie: cookie #你的cookie}def get_link_ist(collection_num): page = input(你想要多少頁？(注意身體哦～):) result = [] collection_title = None for i in range(1, page+1): link = https://www.zhihu.com/collection/{}?page={}.format(collection_num, i) response = requests.get(link, headers=headers).content sel = html.fromstring(response) # 創建文件夾 if collection_title is None: # 收藏夾名字 collection_title = sel.xpath(//h2[@class="zm-item-title zm-editable-content"]/text())[0].strip() if not os.path.exists(collection_title): os.mkdir(collection_title) each = sel.xpath(//div[@class="zm-item"]//div[@class="zm-item-answer "]/link) for e in each: link = https://www.zhihu.com + e.xpath(@href)[0] result.append(link) return [collection_title, result]def get_pic(collection, answer_link): response = requests.get(answer_link, headers=headers).content sel = html.fromstring(response) title = sel.xpath(//h1[@class="QuestionHeader-title"]/text())[0].strip() try: # 匿名用戶 author = sel.xpath(//a[@class="UserLink-link"]/text())[0].strip() except: author = u匿名用戶 # 新建路徑 path = collection + / + title + - + author try: if not os.path.exists(path): os.mkdir(path) n = 1 for i in sel.xpath(//div[@class="RichContent-inner"]//img/@src): # 去除whitedot鏈接 if whitedot not in i: # print i pic = requests.get(i).content fname = path + / + str(n) + .jpg with open(fname, wb) as p: p.write(pic) n += 1 print u{} 已完成.format(title) except : passif __name__ == __main__: collection_num = input(輸入收藏夾號碼：) r = get_link_ist(collection_num) collection = r[0] collection_list = r[1] for k in collection_list: get_pic(collection, k)

嗯～就這樣～（點了贊再走啊）
推薦閱讀：

※Python模擬登陸萬能法-微博|知乎
※從零開始寫Python爬蟲 --- 爬蟲應用：今天吃什麼？

TAG:爬虫 | Python | 知乎 |