標籤:

爬蟲入門4爬取信息之後存儲方法

爬蟲入門4爬取信息之後存儲方法

來自專欄數據分析之路

1.用python語句存儲數據

寫文件的時候,要用到with open()語句:

with open(name,mode,encoding) as file: file.write()# 注意,with open() 後面的語句有一個縮進

name:包含文件名稱的字元串,比如:『xiaozhu.txt』;

mode:決定了打開文件的模式,只讀/寫入/追加等;

encoding:表示我們要寫入數據的編碼,一般為utf-8或者gbk

file:表示我們在代碼中對文件的命名。

from lxml import etreeimport requestsimport timewith open("C:\Users\nanafighting\Desktop\xiaozhu.txt",w,encoding=utf-8) as f: for a in range(1,6): url=http://cd.xiaozhu.com/search-duanzufang-p{}-0/.format(a) data=requests.get(url).text s=etree.HTML(data) file=s.xpath(//*[@id="page_list"]/ul/li) time.sleep(3) for infor in file: title=infor.xpath(./div[2]/div/a/span/text())[0] price=infor.xpath(./div[2]/span[1]/text())[0] scribe=infor.xpath(./div[2]/div/em/text())[0].strip() pic=infor.xpath(./a/img/@lazy_src)[0] f.write({} {} {}
{}.format(title,price,scribe,pic))

結果如圖:

2.文件名為CSV格式:

from lxml import etreeimport requestsimport timewith open("C:\Users\nanafighting\Desktop\xzzf.csv",w,encoding=utf-8) as f: for a in range(1,6): url=http://cd.xiaozhu.com/search-duanzufang-p{}-0/.format(a) data=requests.get(url).text s=etree.HTML(data) file=s.xpath(//*[@id="page_list"]/ul/li) time.sleep(3) for infor in file: title=infor.xpath(./div[2]/div/a/span/text())[0] price=infor.xpath(./div[2]/span[1]/text())[0] scribe=infor.xpath(./div[2]/div/em/text())[0].strip() pic=infor.xpath(./a/img/@lazy_src)[0] f.write({} {} {} {}
.format(title,price,scribe,pic))

Excel 打開 CSV 出現亂碼怎麼辦?

  1. 在記事本中打開文件
  2. 另存為 – 選擇編碼為「ANSI」

推薦閱讀:

TAG:python爬蟲 |