使用Excel+VBA對網頁進行操作
對於Excel和VBA我所知有限,僅能解決自己遇到的一些問題,並不一定適用於所有場景。以下內容建立在了解基本VBA使用以及HTML語言知識的基礎上:
一、前期準備
就我所知,VBA並不能操作任意瀏覽器及網頁,我們所能做的僅僅是對IE進行一些操作,是的,僅僅是IE。不要告訴我電腦上沒有IE,那樣就可以Exit Sub了。就像Python用import、C#用using一樣,VBA也需要引用一些庫才能對IE進行操作,不過好在同屬微軟產品,所以我們能很簡便的利用VBA自帶的一些庫。
首先我們要做的就是在VBA中引用Micorsoft Internet Controls,看這個名字就知道是幫助我們控制IE頁面用的。
二、網頁操作
引用Micorsoft Internet Controls之後,我們就可以對頁面為所欲為了,不過首頁我們要有個頁面,上帝說要有頁面!
1、打開網頁
我們以在百度搜索「扯乎」關鍵詞為例:
With CreateObject("internetexplorer.application") .Visible = True .Navigate "https://www.baidu.com/s?wd=扯乎"關閉網頁 .Quit End With
代碼很簡單,先創建一個IE對象,然後給一些屬性賦值。Visible是可見性,說的是在對網頁進行操作時,這個網頁是不是會被看見。熟練之後可以設置為False,不僅讓程序在跑的時候有種神秘感(並沒有),還能稍微加快一點速度。
不過有一點要記住,這個網頁我們打開之後並沒有關閉,也就是說程序結束後需要手動關閉,如果網頁不可見是無法手動關閉的。代碼中注釋的部分就是關閉網頁用的。Navigate不用多說就是URL。
我們必須要等網頁完全載入完才能開始信息的抓取,這個時候使用到:(從這裡開始,所有的代碼都需要寫在With代碼塊中)
While .ReadyState <> 4 Or .Busy DoEvents Wend
Busy是網頁忙碌狀態,ReadyState是HTTP的5種就緒狀態,對應如下:
- 0:請求未初始化(還沒有調用 open())。
- 1:請求已經建立,但是還沒有發送(還沒有調用 send())。
- 2:請求已發送,正在處理中(通常現在可以從響應中獲取內容頭)。
- 3:請求在處理中;通常響應中已有部分數據可用了,但是伺服器還沒有完成響應的生成。
- 4:響應已完成;您可以獲取並使用伺服器的響應了。
2、獲取信息
我們先把頁面中的所有內容抓下來,後期篩選出有用的部分再慢慢給抓取添加條件。
Set dmt = .Document For i = 0 To dmt.all.Length - 1 Set htMent = dmt.all(i) With ActiveSheet .Cells(i + 2, "A") = htMent.tagName .Cells(i + 2, "B") = TypeName(htMent) .Cells(i + 2, "C") = htMent.ID .Cells(i + 2, "D") = htMent.Name .Cells(i + 2, "E") = htMent.Value .Cells(i + 2, "F") = htMent.Text .Cells(i + 2, "G") = htMent.innerText End With Next i
這塊代碼和JS有些相似,需要從IE.Document.all中把頁面上所有節點找出來。這裡也提供其他幾種方法:
- getElementById("IDName"):返回第一個內部標有IDName的標籤
- getElementsByName("a") :返回所有的<a>標籤,返回值為集合
- getElementsByClassName("css"):返回所有樣式名稱為css的標籤,返回值為集合
這些都是在抓取了全部頁面內容後幫助篩選有效信息時使用起來比較方便的。當然all還是最好用的,因為all也存在all("IDName")以及all.IDName等等用法。
上面代碼部分返回的屬性值都是HTML基本內容,就不一一解釋了。
3、填充信息
網抓神器當然還是Python,大部分人使用Excel的目的還是在於對頁面內容進行自動填充,直接讓表格提交網頁,問卷錄入之類的工作都省心不少。在抓取了頁面內容之後,想填充更加是易如反掌的事情,只需要直接給頁面標籤的Value屬性賦值就可以了。
不過網頁中除了文本框,可能還存在一些其他沒有Value的標籤,比如:下拉菜單、單選框。給這些內容賦值就需要一些基本的HTML知識了。
下拉菜單選擇.all("select")(0).Selected = True單選按鈕選擇.all("radio").Checked = True複選按鈕選擇.all("checkbox").Checked = True
下拉菜單是select標籤,每個選項都在一個option標籤里,所以返回一個集合,需要選中某個選項就要修改對應的Selected屬性為True。單選和複選按鈕都是input標籤,區別在於類型分別是radio和checkbox,要選中某個選項需要修改對應的Checked屬性。
三、數據介面
有時候我們能直接拿到一些API,通過API返回數據當然比打開網頁更方便快捷,所使用的方法也有一些不太一樣。
1、請求介面
比如我從網上得到一個能通過城市查詢免費WIFI的API,通過Excel介面訪問就使用下面的代碼:(雖然是免費的,為了避免麻煩還是把我的AppKey隱去了)
Dim http Set http = CreateObject("Microsoft.XMLHTTP") http.Open "GET", "http://api.avatardata.cn/Wifi/QueryByCity", False http.setRequestHeader "CONTENT-TYPE", "application/x-www-form-urlencoded" http.send "key=[AppKey]&city=北京&page=1"
這時我們創建的對象就不再是IE,而是HTTP對象。這裡用的是ajax的Open方法,GET是數據發送方式,第二個參數是介面地址,第三個參數是指定請求方式是否為非同步。如果這個API有帳號密碼,分別寫在第四第五個參數。
setRequestHeader就是給介面發送一個HTTP協議頭文件,最後send的內容是介面參數。當然,這個QueryString也可以直接寫在URL里,send一個空字元串就可以了。
2、介面返回
介面返回獲取的方式很簡單:
If http.Status = 200 Then Range("A1").Value = http.responseText
這裡的HTTP狀態又變成200了,和之前說好的不一樣啊摔~有興趣可以自己查查具體有哪些。
不過介面返回要麼是JSON要麼是XML,Excel處理起來十分不方便。這裡提供一個處理JSON的方法,是從網上找來的類模塊,具體內容放在附錄里。在添加了這個clsJSON類模塊後,對JSON的處理就變得十分簡單了。
將上面的代碼改成:
If http.Status = 200 Then Dim json$ json = http.responseText Dim objJSON As New clsJSON, dicJSON As Object Set dicJSON = objJSON.parse(json) For i = 1 To dicJSON("result")("data").Count Sheet1.Cells(i + 1, 1) = dicJSON("result")("data")(i)("name") Sheet1.Cells(i + 1, 2) = dicJSON("result")("data")(i)("intro") Sheet1.Cells(i + 1, 3) = dicJSON("result")("data")(i)("address") Next i End If
介面返回的示例我也放在附錄里了,根據介面返回的對象名、數組名去修改dicJSON後面的內容就可以了。這個處理JSON的模塊用的是VBA中字典+集合的原理,所以數據處理後的調用方式也參照字典和集合。
以上是我用Excel+VBA進行網頁操作的一些個人經驗,希望能幫助到一些有需要的人。有什麼錯漏的地方,也希望知乎大牛批評指正。
附錄一:VBA處理JSON的類模塊
Option Explicit================================ VBA處理JSON文件的類模塊 http://www.cnhup.com================================Const INVALID_JSON As Long = 1Const INVALID_OBJECT As Long = 2Const INVALID_ARRAY As Long = 3Const INVALID_BOOLEAN As Long = 4Const INVALID_NULL As Long = 5Const INVALID_KEY As Long = 6Private Sub Class_Initialize()End SubPrivate Sub Class_Terminate()End SubPublic Function parse(ByRef str As String) As Object Dim index As Long index = 1 On Error Resume Next Call skipChar(str, index) Select Case Mid(str, index, 1) Case "{" Set parse = parseObject(str, index) Case "[" Set parse = parseArray(str, index) End SelectEnd FunctionPrivate Function parseObject(ByRef str As String, ByRef index As Long) As Object Set parseObject = CreateObject("Scripting.Dictionary") "{" Call skipChar(str, index) If Mid(str, index, 1) <> "{" Then Err.Raise vbObjectError + INVALID_OBJECT, Description:="char " & index & " : " & Mid(str, index) index = index + 1 Do Call skipChar(str, index) If "}" = Mid(str, index, 1) Then index = index + 1 Exit Do ElseIf "," = Mid(str, index, 1) Then index = index + 1 Call skipChar(str, index) End If Dim key As String add key/value pair parseObject.Add key:=parseKey(str, index), Item:=parseValue(str, index) LoopEnd FunctionPrivate Function parseArray(ByRef str As String, ByRef index As Long) As Collection Set parseArray = New Collection "[" Call skipChar(str, index) If Mid(str, index, 1) <> "[" Then Err.Raise vbObjectError + INVALID_ARRAY, Description:="char " & index & " : " + Mid(str, index) index = index + 1 Do Call skipChar(str, index) If "]" = Mid(str, index, 1) Then index = index + 1 Exit Do ElseIf "," = Mid(str, index, 1) Then index = index + 1 Call skipChar(str, index) End If add value parseArray.Add parseValue(str, index) LoopEnd FunctionPrivate Function parseValue(ByRef str As String, ByRef index As Long) Call skipChar(str, index) Select Case Mid(str, index, 1) Case "{" Set parseValue = parseObject(str, index) Case "[" Set parseValue = parseArray(str, index) Case """", "" parseValue = parseString(str, index) Case "t", "f" parseValue = parseBoolean(str, index) Case "n" parseValue = parseNull(str, index) Case Else parseValue = parseNumber(str, index) End SelectEnd FunctionPrivate Function parseString(ByRef str As String, ByRef index As Long) As String Dim quote As String Dim char As String Dim code As String Call skipChar(str, index) quote = Mid(str, index, 1) index = index + 1 Do While index > 0 And index <= Len(str) char = Mid(str, index, 1) Select Case (char) Case "" index = index + 1 char = Mid(str, index, 1) Select Case (char) Case """", "", "/" parseString = parseString & char index = index + 1 Case "b" parseString = parseString & vbBack index = index + 1 Case "f" parseString = parseString & vbFormFeed index = index + 1 Case "n" parseString = parseString & vbNewLine index = index + 1 Case "r" parseString = parseString & vbCr index = index + 1 Case "t" parseString = parseString & vbTab index = index + 1 Case "u" index = index + 1 code = Mid(str, index, 4) parseString = parseString & ChrW(Val("&h" + code)) index = index + 4 End Select Case quote index = index + 1 Exit Function Case Else parseString = parseString & char index = index + 1 End Select LoopEnd FunctionPrivate Function parseNumber(ByRef str As String, ByRef index As Long) Dim value As String Dim char As String Call skipChar(str, index) Do While index > 0 And index <= Len(str) char = Mid(str, index, 1) If InStr("+-0123456789.eE", char) Then value = value & char index = index + 1 Else If InStr(value, ".") Or InStr(value, "e") Or InStr(value, "E") Then parseNumber = CDbl(value) Else parseNumber = CInt(value) End If Exit Function End If LoopEnd FunctionPrivate Function parseBoolean(ByRef str As String, ByRef index As Long) As Boolean Call skipChar(str, index) If Mid(str, index, 4) = "true" Then parseBoolean = True index = index + 4 ElseIf Mid(str, index, 5) = "false" Then parseBoolean = False index = index + 5 Else Err.Raise vbObjectError + INVALID_BOOLEAN, Description:="char " & index & " : " & Mid(str, index) End IfEnd FunctionPrivate Function parseNull(ByRef str As String, ByRef index As Long) Call skipChar(str, index) If Mid(str, index, 4) = "null" Then parseNull = Null index = index + 4 Else Err.Raise vbObjectError + INVALID_NULL, Description:="char " & index & " : " & Mid(str, index) End IfEnd FunctionPrivate Function parseKey(ByRef str As String, ByRef index As Long) As String Dim dquote As Boolean Dim squote As Boolean Dim char As String Call skipChar(str, index) Do While index > 0 And index <= Len(str) char = Mid(str, index, 1) Select Case (char) Case """" dquote = Not dquote index = index + 1 If Not dquote Then Call skipChar(str, index) If Mid(str, index, 1) <> ":" Then Err.Raise vbObjectError + INVALID_KEY, Description:="char " & index & " : " & parseKey End If End If Case "" squote = Not squote index = index + 1 If Not squote Then Call skipChar(str, index) If Mid(str, index, 1) <> ":" Then Err.Raise vbObjectError + INVALID_KEY, Description:="char " & index & " : " & parseKey End If End If Case ":" If Not dquote And Not squote Then index = index + 1 Exit Do End If Case Else If InStr(vbCrLf & vbCr & vbLf & vbTab & " ", char) Then Else parseKey = parseKey & char End If index = index + 1 End Select LoopEnd FunctionPublic Sub skipChar(ByRef str As String, ByRef index As Long) While index > 0 And index <= Len(str) And InStr(vbCrLf & vbCr & vbLf & vbTab & " ", Mid(str, index, 1)) index = index + 1 WendEnd SubPublic Function toString(ByRef obj As Variant) As String Select Case VarType(obj) Case vbNull toString = "null" Case vbDate toString = """" & CStr(obj) & """" Case vbString toString = """" & encode(obj) & """" Case vbObject Dim bFI, i bFI = True If TypeName(obj) = "Dictionary" Then toString = toString & "{" Dim keys keys = obj.keys For i = 0 To obj.Count - 1 If bFI Then bFI = False Else toString = toString & "," Dim key key = keys(i) toString = toString & """" & key & """:" & toString(obj(key)) Next i toString = toString & "}" ElseIf TypeName(obj) = "Collection" Then toString = toString & "[" Dim value For Each value In obj If bFI Then bFI = False Else toString = toString & "," toString = toString & toString(value) Next value toString = toString & "]" End If Case vbBoolean If obj Then toString = "true" Else toString = "false" Case vbVariant, vbArray, vbArray + vbVariant Dim sEB toString = multiArray(obj, 1, "", sEB) Case Else toString = Replace(obj, ",", ".") End SelectEnd FunctionPrivate Function encode(str) As String Dim i, j, aL1, aL2, c, p aL1 = Array(&H22, &H5C, &H2F, &H8, &HC, &HA, &HD, &H9) aL2 = Array(&H22, &H5C, &H2F, &H62, &H66, &H6E, &H72, &H74) For i = 1 To Len(str) p = True c = Mid(str, i, 1) For j = 0 To 7 If c = Chr(aL1(j)) Then encode = encode & "" & Chr(aL2(j)) p = False Exit For End If Next If p Then Dim a a = AscW(c) If a > 31 And a < 127 Then encode = encode & c ElseIf a > -1 Or a < 65535 Then encode = encode & "u" & String(4 - Len(Hex(a)), "0") & Hex(a) End If End If NextEnd FunctionPrivate Function multiArray(aBD, iBC, sPS, ByRef sPT) Array BoDy, Integer BaseCount, String PoSition Dim iDU, iDL, i Integer DimensionUBound, Integer DimensionLBound On Error Resume Next iDL = LBound(aBD, iBC) iDU = UBound(aBD, iBC) Dim sPB1, sPB2 String PointBuffer1, String PointBuffer2 If Err.Number = 9 Then sPB1 = sPT & sPS For i = 1 To Len(sPB1) If i <> 1 Then sPB2 = sPB2 & "," sPB2 = sPB2 & Mid(sPB1, i, 1) Next multiArray = multiArray & toString(Eval("aBD(" & sPB2 & ")")) multiArray = multiArray & toString(aBD(sPB2)) Else sPT = sPT & sPS multiArray = multiArray & "[" For i = iDL To iDU multiArray = multiArray & multiArray(aBD, iBC + 1, i, sPT) If i < iDU Then multiArray = multiArray & "," Next multiArray = multiArray & "]" sPT = Left(sPT, iBC - 2) End If Err.ClearEnd Function
附錄二:JSON返回示例
{"resultcode":"200","reason":"ReturnSuccessd!","result":{"data":[{"name":"北京市法雨合","intro":"法雨合0層","address":"北京市朝陽區朝陽區三里屯","google_lat":"39.9372423","google_lon":"116.4480615","baidu_lat":"39.942952987502","baidu_lon":"116.45464108129","province":"北京市","city":"北京市"},{"name":"北京朝陽西壩河光熙門北里","intro":"朝陽西壩河光熙門北里34-8號0層","address":"北京市朝陽區朝陽區西壩河光熙門北里34號-8號0層","google_lat":"39.9635121","google_lon":"116.435895","baidu_lat":"39.969407173324","baidu_lon":"116.44243487981","province":"北京市","city":"北京市"},{"name":"北京朝陽三里屯北街","intro":"","address":"北京市朝陽區朝陽三里屯北街8號0層","google_lat":"39.9254286","google_lon":"116.4605935","baidu_lat":"39.931073085771","baidu_lon":"116.46719483818","province":"北京市","city":"北京市"},{"name":"北京大都酒吧街","intro":"","address":"北京市朝陽區元大都酒吧街11號","google_lat":"39.975984","google_lon":"116.424389","baidu_lat":"39.982089966811","baidu_lon":"116.43086831752","province":"北京市","city":"北京市"},{"name":"北京西城前海北沿","intro":"","address":"北京市西城區西城前海北沿10號0層","google_lat":"39.9369032","google_lon":"116.3919335","baidu_lat":"39.943215619704","baidu_lon":"116.39830652238","province":"北京市","city":"北京市"},{"name":"北京市西城後海南沿36號對面","intro":"後海南沿36號對面0層","address":"北京市西城區後海南沿36號","google_lat":"39.9396792","google_lon":"116.389129","baidu_lat":"39.945967638433","baidu_lon":"116.39551153315","province":"北京市","city":"北京市"},{"name":"北京市賽百味","intro":"ok","address":"北京市西城區中關村東路18號","google_lat":"39.9810991","google_lon":"116.3333866","baidu_lat":"39.9867766224","baidu_lon":"116.34001632032","province":"北京市","city":"北京市"},{"name":"北京市光華路數碼01","intro":"","address":"北京市朝陽區光華路數碼01大廈0層","google_lat":"39.9132392","google_lon":"116.4592309","baidu_lat":"39.918885961978","baidu_lon":"116.46583845234","province":"北京市","city":"北京市"},{"name":"北京市盛銘幫逸園會館","intro":"盛銘幫逸園會館0","address":"北京市朝陽區逸園25號","google_lat":"39.8710876","google_lon":"116.4602965","baidu_lat":"39.876744728506","baidu_lon":"116.46693498949","province":"北京市","city":"北京市"},{"name":"北京市地平線酒吧","intro":"","address":"北京市朝陽區朝陽三里屯北街70號","google_lat":"39.9254286","google_lon":"116.4605935","baidu_lat":"39.931073085771","baidu_lon":"116.46719483818","province":"北京市","city":"北京市"}],"pageinfo":{"pnums":20,"current":1 } }}
推薦閱讀:
※Excel+VBA常用功能(一):工作表的拆分
※怎樣在Word中用VBA操作表格
※九九乘法表?三行就夠了!
※vba split函數應用:分離商品與數量
※利用Excel一鍵獲取功效矩陣氣泡圖的奧秘