由正則引起的 Wecenter 拒絕服務漏洞

02-09

一年前在前公司搭建了一個wecenter程序的社區，忽然有一天發現社區打開首頁都會超時，後面排查發現是php超時了，當時知道是文章引起的，但是手上還有其他項目在寫，就沒去跟，把文章刪了就沒去管他了。直到前兩個星期再次發現了這種問題，剛好手上也沒什麼事情，就抽空去跟了下代碼。

漏洞形成原因：

文章內容65k個字元，字元串太大去匹配貪婪模式，導致php timeout，看看文章的欄位類型.

跟下代碼：

入口文件 index.php 大概23行：

AWS_APP::run();

繼續跟進去 /system/aws_app.inc.php 只看關鍵代碼大概104行：

$handle_controller->$action_method();

/app/article/main.php 關鍵代碼大概36行：

public function index_action() { //...省略部分代碼 // $article_info[message] 是文章內容 $article_info[message] = FORMAT::parse_attachs(nl2br(FORMAT::parse_bbcode($article_info[message]))); //...省略部分代碼 TPL::output(article/index); }

跟進去看看FORMAT::parse_bbcode 對文章內容做了什麼操作：

/system/class/cls_format.inc.php 大概78行：

public static function parse_bbcode($text){ if (!$text) { return false; } return self::parse_links(load_class(Services_BBCode)->parse($text));}

感覺wecenter的版本挺亂的。

這裡可以用echo rand();exit;來調試了

繼續跟進去self::parse_links ：

public static function parse_links($str) { $str = @preg_replace_callback(/(?<!!![](|"||)|>)(https?://[-a-zA-Z0-9@:;%_+.~#?&//=!]+)(?!"||)|>)/i, parse_link_callback, $str); if (strpos($str, http) === FALSE) { $str = @preg_replace_callback(/(www.[-a-zA-Z0-9@:;%_+.~#?&//=]+)/i, parse_link_callback, $str); } // 經過調試發現問題在這一行。傳入的$str的位元組大概6w左右，這裡用到了貪婪模式 - - 這個地方的已經修復了：https://github.com/wecenter/wecenter/commit/177a9e8bab6aec8725f258df02f8f214e5b2469c $str = @preg_replace(/([a-z0-9+_-]+[.]?[a-z0-9+_-]+@[a-z0-9-]+.+[a-z]{2,6}+(.+[a-z]{2,6})?)/is, <a href="mailto:1">1</a>, $str); echo rand();exit; return $str; }

把 preg_replace 裡面的正則暫時改成w，發現還是php還是存在timeout，繼續跟代碼。

再回到 /app/article/main.php 的 index_action 函數的最後一行 TPL::output(article/index);

/system/class/cls_template.inc.php 下的 output 函數，大概56行：

$display_template_filename = default/ . $template_filename;/*省略部分代碼*/$output = self::$view->getOutput($display_template_filename);

看看 self::$view 怎麼來的：

/system/class/cls_template.inc.php 下的 initialize 函數：

public static function initialize() { if (!is_object(self::$view)) { self::$template_path = realpath(ROOT_PATH . views/); self::$view = new Savant3( array( template_path => array(self::$template_path), //filters => array(Savant3_Filter_trimwhitespace, filter) ) ); if (file_exists(AWS_PATH . config.inc.php) AND class_exists(AWS_APP, false)) { self::$in_app = true; } } return self::$view;}

跟進去 self::$view->getOutput 看看$output的值是什麼

/system/Savant3.php 大概1004行：

public function getOutput($tpl = null) { $output = $this->fetch($tpl); if ($this->isError($output)) { $text = $this->__config[error_text]; return $this->escape($text); } else { return $output; } }

$this->fetch 能看到他include了模板，並且把內容return了出去

public function fetch($tpl = null) { // 省略部分代碼 } else { // yes. execute the template script. move the script-path // out of the local scope, then clean up the local scope to // avoid variable name conflicts. $this->__config[fetch] = $result; unset($result); unset($tpl); // are we doing extraction? if ($this->__config[extract]) { // pull variables into the local scope. extract(get_object_vars($this), EXTR_REFS); } // buffer output so we can return it instead of displaying. ob_start(); // are we using filters? if ($this->__config[filters]) { // use a second buffer to apply filters. we used to set // the ob_start() filter callback, but that would // silence errors in the filters. Hendy Irawan provided // the next three lines as a "verbose" fix. ob_start(); include $this->__config[fetch]; echo $this->applyFilters(ob_get_clean()); } else { // no filters being used. include $this->__config[fetch]; } // reset the fetch script value, get the buffer, and return. $this->__config[fetch] = null; return ob_get_clean(); } }

繼續看拿到模板內容後他是怎麼處理的：

在這裡耽誤了很久，一開始直接echo rand();exit;調試的，沒注意看有多個模板：

調試代碼改成：if($display_template_filename == default/article/index.tpl.htm){ echo rand(); exit;}

/system/class/cls_template.inc.php 下的 output 函數，大概 134行：

//兩個貪婪模式的正則，改一下就ok了。$output = preg_replace(/[a-zA-Z0-9]+_?[a-zA-Z0-9]*-__/, , $output);$output = preg_replace(/(__)?[a-zA-Z0-9]+_?[a-zA-Z0-9]*-([|"])/, 2, $output);if($display_template_filename == default/article/index.tpl.htm){ echo rand(); exit;}

剛開始真沒想到貪婪模式，正則現在差不多就記得點星問了，後來跟@L3m0n(檸檬) 叔叔在做題的時候提了一下，他說是貪婪模式，複習下正則吧…

為什麼貪婪模式會導致php timeout?

參考：

正則表達式的三種模式【貪婪、勉強、侵佔】的分析

正則基礎之——NFA引擎匹配原理

正則基礎之——貪婪與非貪婪模式

<進階-1> 正則表達式的匹配原理

*貪婪模式圖

抽出上面的其中一條正則來說：

[a-zA-Z0-9]+_?[a-zA-Z0-9]*-__

把正則切割成幾部分：

認真看貪婪模式的那張圖片，假如傳入的字元串是：

[img]abc

把字元串切割一下：

正則在線debug：https://regex101.com

正則匹配過程如下（當然我說的也不一樣是對，有興趣的可以自己去看看正則表達式的匹配原理）：

第一次匹配：從字元串位置0開始，子表達式"[a-zA-Z0-9]+"，匹配"["，匹配失敗，繼續往前匹配；

第二次匹配：從字元串位置1開始，子表達式"[a-zA-Z0-9]+"，匹配"i", 匹配成功，因為是貪婪模式，一直匹配到"g"那個地方才結束；

第三次匹配：從字元串位置4開始，子表達式"_?"，匹配"]"，同時記錄備選狀態，匹配失敗，此時進行回溯，找到備選狀態，"_?"忽略匹配；

第四次匹配：從字元串位置4開始，子表達式"[a-zA-Z0-9]*"，匹配"]"，同時記錄備選狀態，匹配失敗，此時進行回溯，找到備選狀態，"_?"忽略匹配；

第五次匹配：從字元串位置4開始，子表達式"-"，匹配"]"，匹配失敗，向前查找可供回溯的狀態，把控制權交給"_?"，由前面匹配成功的子表達式讓出已匹配的字元"g"；

第六次匹配：從字元串位置3開始，子表達式"_?"，匹配"g"，同時記錄備選狀態，匹配失敗，此時進行回溯，找到備選狀態，"_?"忽略匹配；

第七次匹配：從字元串位置3開始，子表達式"[a-zA-Z0-9]*"，匹配"g", 匹配成功；

第八次匹配：從字元串位置4開始，子表達式"-"，匹配"]"，匹配失敗，向前查找可供回溯的狀態，把控制權交給"_?"，由前面匹配成功的子表達式讓出已匹配的字元"mg"；

第九次匹配：從字元串位置2開始，子表達式"_?"，匹配"m"，匹配零次或者一次，不存在這個字元，匹配零次；

第十次匹配：從字元串位置2開始，子表達式"[a-zA-Z0-9]*"，匹配"m"，匹配成功，因為是貪婪模式，一直匹配到"g"那個地方才結束；

第十一次匹配：從字元串位置4開始，子表達式"-"，匹配"]"，匹配失敗，當前位置正則已經嘗試了所有可能，現在從新開始匹配，之前是從i開始匹配成功的，下面從m開始匹配。

第十二次匹配：從字元串位置2開始，子表達式"[a-zA-Z0-9]+"，匹配"m"，匹配成功，因為是貪婪模式，一直匹配到"g"那個地方才結束；

會一直這樣循環直到正則嘗試過所有的位置都不能找到匹配結果才會匹配失敗。

為什麼會timeout？

正則是重複的子表達式且貪婪模式組成不能正確匹配，字元串是超大的話，就會嘗試匹配很多次很多次很多次，這就導致了php timeout了。

拒絕服務效果：

修復後：

修復方案：

/system/class/cls_template.inc.php 下的 output 函數，大概 134-135 行（正則）修改為如下：

$output = preg_replace(/[a-zA-Z0-9_?]+-__/, , $output);$output = preg_replace(/(__)?[a-zA-Z0-9_?]+-([|"])/, 2, $output);

如何避免這種問題：

1.子表達式不要重複並且都貪婪模式；

2.寫完正則之後debug一下；

本文為 Mosuan_ 授權嘶吼發布，如若轉載，請註明原文地址： http://www.4hou.com/technology/7374.html 更多內容請關注「嘶吼專業版」——Pro4hou