python selenium 如何查看網頁的源代碼？

01-28

然後使用正則

謝邀。driver有一個page_source屬性，即網頁源碼。我以百度首頁為例，展示怎麼用selenium抓取和查看它的源碼：

from selenium import webdriver

url = http://www.baidu.com driver = webdriver.Firefox() driver.get(url) # 網頁源碼 page = driver.page_source print(page) # 關閉瀏覽器 driver.close()

這裡我再提幾個建議，望採納

1.提問一般先用搜索引擎查看一下是否有相關問題的答案，這個明顯有的。。。

2.既然題主想學selenium，那麼遇到問題它的文檔是應該優先查看的，py-selenium官方文檔：1. Installation；如果英文閱讀困難，這裡有它的中文文檔：GitHub - fool2fish/selenium-doc: selenium 中文文檔。

3.selenium一般是和瀏覽器一起使用的，考慮到題主可能以前沒接觸過selenium，這裡用的firefox,出於性能考慮，一般我們都用phantomjs等無UI的瀏覽器

4. 正則表達式是解析頁面最通過最基本的做法，但不是最簡單的做法，推薦學習beautifulsoup或者lxml。有人可能覺得beautifulsoup性能不好，但我覺得最影響爬蟲性能的是http請求而不是解析，所以我推薦了bs，它很容易上手的。

嗯，要說的就這麼多了，祝好

import re from selenium import webdriver


driver = webdriver.Firefox()

driver.get(https://www.baidu.com)

source_code = driver.page_source

print source_code

urls = re.findall(r&, source_code, re.I) for url in urls: print url driver.close()

from selenium import webdriver import re


driver=webdriver.Chrome()

driver.get(https://www.baidu.com)

html_text=driver.page_source

temp=re.findall(r&S+&,html_text)

print(temp) driver.quit()

python selenium 如何查看網頁的源代碼 ？