selenium phantomjs 翻頁
翻頁
- 對於這個需求我們兩種方法,一個是解析源碼import timefrom selenium import webdriverfrom selenium.webdriver.support.select import Selectimport randomfrom lxml import etreedef extract_content(item): passdriver = webdriver.PhantomJS()# driver = webdriver.Chrome()url = driver.get(url)doc = etree.HTML(driver.page_source)page = int(doc.xpath(//*[@id="PageTotalSpan"]/text())[0]) // 10 + 1 # 獲取頁碼for i in range(1, page): response = etree.HTML(driver.page_source) contents = response.xpath(//td[@valign="top"]/table/tbody/tr/td) extract_content(contents) driver.find_element_by_xpath(//*[@id="_PageBar_Index_list1"]).clear() # 清除頁碼 a = random.uniform(1, 2) time.sleep(a) driver.find_element_by_xpath(//*[@id="_PageBar_Index_list1"]).send_keys(i) # 填寫頁碼 driver.find_element_by_xpath( //*[@id="PageBarDiv"]/table/tbody/tr/td/table/tbody/tr/td[7]/a/img).click() # 翻頁
- 獲取載入後的動態翻頁針對於Elements 跟 page source不一致情況
import randomimport timefrom selenium import webdriverdef extract_content(item): passurl = xxxxdriver = webdriver.PhantomJS()driver.get(url)for i in range(1, 10): print(i) driver.find_element_by_xpath(//*[@id="_PageBar_Index_list1"]).clear() driver.find_element_by_xpath(//*[@id="_PageBar_Index_list1"]).send_keys(i) driver.find_element_by_xpath( //*[@id="PageBarDiv"]/table/tbody/tr/td/table/tbody/tr/td[7]/a/img).click() a = random.uniform(8, 10) # 載入時間 time.sleep(a) contents = driver.find_elements_by_xpath(//table[@id="illExampleDataTable"]/tbody) # 捕獲全部的載入動態 extract_content(contents)
推薦閱讀:
※Python如何從線程中返回值
※開發項目「狗咬刺蝟」無從下手,問題出在哪兒?
※黃哥Python 幫網友debug裝飾器代碼
※機器學習-淺談隨機森林的幾個tricks-20170917
※10min手寫(b四):b寫配置文件生成增刪改查系統
TAG:Python |