爬蟲入門4爬取信息之後存儲方法

10-14

爬蟲入門4爬取信息之後存儲方法

來自專欄數據分析之路

1.用python語句存儲數據

寫文件的時候，要用到with open()語句：

with open(name,mode,encoding) as file: file.write()# 注意，with open() 後面的語句有一個縮進

name:包含文件名稱的字元串，比如：『xiaozhu.txt』;

mode:決定了打開文件的模式，只讀/寫入/追加等；

encoding:表示我們要寫入數據的編碼，一般為utf-8或者gbk

file:表示我們在代碼中對文件的命名。

from lxml import etreeimport requestsimport timewith open("C:\Users\nanafighting\Desktop\xiaozhu.txt",w,encoding=utf-8) as f: for a in range(1,6): url=http://cd.xiaozhu.com/search-duanzufang-p{}-0/.format(a) data=requests.get(url).text s=etree.HTML(data) file=s.xpath(//*[@id="page_list"]/ul/li) time.sleep(3) for infor in file: title=infor.xpath(./div[2]/div/a/span/text())[0] price=infor.xpath(./div[2]/span[1]/text())[0] scribe=infor.xpath(./div[2]/div/em/text())[0].strip() pic=infor.xpath(./a/img/@lazy_src)[0] f.write({} {} {} {}.format(title,price,scribe,pic))

結果如圖：

2.文件名為CSV格式：

from lxml import etreeimport requestsimport timewith open("C:\Users\nanafighting\Desktop\xzzf.csv",w,encoding=utf-8) as f: for a in range(1,6): url=http://cd.xiaozhu.com/search-duanzufang-p{}-0/.format(a) data=requests.get(url).text s=etree.HTML(data) file=s.xpath(//*[@id="page_list"]/ul/li) time.sleep(3) for infor in file: title=infor.xpath(./div[2]/div/a/span/text())[0] price=infor.xpath(./div[2]/span[1]/text())[0] scribe=infor.xpath(./div[2]/div/em/text())[0].strip() pic=infor.xpath(./a/img/@lazy_src)[0] f.write({} {} {} {} .format(title,price,scribe,pic))

Excel 打開 CSV 出現亂碼怎麼辦？

在記事本中打開文件
另存為 – 選擇編碼為「ANSI」