爬蟲入門4爬取信息之後存儲方法
來自專欄數據分析之路
1.用python語句存儲數據
寫文件的時候,要用到with open()語句:
with open(name,mode,encoding) as file: file.write()# 注意,with open() 後面的語句有一個縮進
name:包含文件名稱的字元串,比如:『xiaozhu.txt』;
mode:決定了打開文件的模式,只讀/寫入/追加等;
encoding:表示我們要寫入數據的編碼,一般為utf-8或者gbk
file:表示我們在代碼中對文件的命名。
from lxml import etreeimport requestsimport timewith open("C:\Users\nanafighting\Desktop\xiaozhu.txt",w,encoding=utf-8) as f: for a in range(1,6): url=http://cd.xiaozhu.com/search-duanzufang-p{}-0/.format(a) data=requests.get(url).text s=etree.HTML(data) file=s.xpath(//*[@id="page_list"]/ul/li) time.sleep(3) for infor in file: title=infor.xpath(./div[2]/div/a/span/text())[0] price=infor.xpath(./div[2]/span[1]/text())[0] scribe=infor.xpath(./div[2]/div/em/text())[0].strip() pic=infor.xpath(./a/img/@lazy_src)[0] f.write({} {} {}
{}.format(title,price,scribe,pic))
結果如圖:
2.文件名為CSV格式:
from lxml import etreeimport requestsimport timewith open("C:\Users\nanafighting\Desktop\xzzf.csv",w,encoding=utf-8) as f: for a in range(1,6): url=http://cd.xiaozhu.com/search-duanzufang-p{}-0/.format(a) data=requests.get(url).text s=etree.HTML(data) file=s.xpath(//*[@id="page_list"]/ul/li) time.sleep(3) for infor in file: title=infor.xpath(./div[2]/div/a/span/text())[0] price=infor.xpath(./div[2]/span[1]/text())[0] scribe=infor.xpath(./div[2]/div/em/text())[0].strip() pic=infor.xpath(./a/img/@lazy_src)[0] f.write({} {} {} {}
.format(title,price,scribe,pic))
Excel 打開 CSV 出現亂碼怎麼辦?
- 在記事本中打開文件
- 另存為 – 選擇編碼為「ANSI」
推薦閱讀:
TAG:python爬蟲 |