頁面跳轉亂碼解決之道

06-01

問題：不同網站的跳轉出現亂碼，不同編碼的頁面傳遞參數也出現亂碼

搞清楚兩個問題：

URL文字本身的編碼，一般與你網頁的編碼一致；中文一般才用UTF-8、GBK、GB2312
什麼是URL編碼

URL的編碼規則

大小寫字母，數字不變
「.」, 「-」, 「*」, 「_」　四個字元不變
空格鍵編碼為加號＂+＂
其它所有字元被視為不安全字元，按所指定的編碼方式編碼（如果未指定則為默認為平台編碼，瀏覽器或操作系統決定），以每位元組十六進位形式表示出來，具體格式為」%xy」．xy為兩個十六進位數，用來描述一個8位的位元組．
RFC1738協議

OK，無論網站使用什麼平台，URL編碼規則，都是一致的，所以不同網站進行頁面跳轉時，出現亂碼問題，就是網站使用的編碼不一致所引起的。

URL解碼-> 轉換字元串的編碼->URL編碼

一、PHP的解決方案：

PHP字元串編碼轉換函數

iconv() 函數

Description string iconv ( string in_charset, string out_charset, string str )

注意：第二個參數，除了可以指定要轉化到的編碼以外，還可以增加兩個後綴：

//TRANSLIT 會自動將不能直接轉化的字元變成一個或多個近似的字元；
//IGNORE 會忽略掉不能轉化的字元，而默認效果是從第一個非法字元截斷。

eg：$str = iconv("UTF-8","GB2312//TRANSLIT",$str);

mb_convert_encoding() 函數

Description string mb_convert_encoding ( string str, string to-encoding [, mixed from-encoding])

注意：需要enable mbstring 擴展庫。

兩者區別：mb_convert_encoding 中根據內容自動識別編碼；mb_convert_encoding功能強大，但是執行效率比iconv差太多；

總結：一般情況下用 iconv，只有當遇到無法確定原編碼是何種編碼時用 mb_convert_encoding 函數.

URL編碼解碼函數

urlencode函數

該函數將傳入的字元串參數進行URL編碼。聲明如下：

string urlencode (string str)

urlencode函數

該函數將傳入的字元串參數進行URL解碼，返回解碼後的字元串。聲明如下：

string urldecode (string str)

其他的

rawurlencode()

rawurldecode()

實例：

<?PHP$url=$_GET["url"];$url = htmlspecialchars(urldecode($url));$keyword = iconv("UTF-8","GB2312//TRANSLIT",$url);$keyword = urlencode($url);header("Location: " . $url);?>

二、JavaScript的解決方案

在使用url進行參數傳遞時，經常會傳遞一些中文名的參數或URL地址，在後台處理時會發生轉換錯誤。在有些傳遞頁面使用GB2312，而在接收頁面使用 UTF8，這樣接收到的參數就可能會與原來發生不一致。使用伺服器端的urlEncode函數編碼的URL，與使用客戶端javascript的 encodeURI函數編碼的URL，結果就不一樣。

javaScript中的編碼方法：

escape() 方法：

採用ISO Latin字符集對指定的字元串進行編碼。所有的空格符、標點符號、特殊字元以及其他非ASCII字元都將被轉化成%xx格式的字元編碼（xx等於該字元在字符集表裡面的編碼的16進位數字）。比如，空格符對應的編碼是%20。unescape方法與此相反。不會被此方法編碼的字元： @ * / +

英文解釋：MSDN JScript Reference: The escape method returns a string value (in Unicode format) that contains the contents of [the argument]. All spaces, punctuation, accented characters, and any other non-ASCII characters are replaced with %xx encoding, where xx is equivalent to the hexadecimal number representing the character. For example, a space is returned as 「%20.」

Edge Core Javascript Guide: The escape and unescape functions let you encode and decode strings. The escape function returns the hexadecimal encoding of an argument in the ISO Latin character set. The unescape function returns the ASCII string for the specified hexadecimal encoding value.

encodeURI() 方法：

把URI字元串採用UTF-8編碼格式轉化成escape格式的字元串。不會被此方法編碼的字元：! @ # $& * ( ) = : / ; ? + 『

英文解釋：MSDN JScript Reference: The encodeURI method returns an encoded URI. If you pass the result to decodeURI, the original string is returned. The encodeURI method does not encode the following characters: 「:」, 「/」, 「;」, and 「?」. Use encodeURIComponent to encode these characters. Edge Core Javascript Guide: Encodes a Uniform Resource Identifier (URI) by replacing each instance of certain characters by one, two, or three escape sequences representing the UTF-8 encoding of the character

encodeURIComponent() 方法：

把URI字元串採用UTF-8編碼格式轉化成escape格式的字元串。與encodeURI()相比，這個方法將對更多的字元進行編碼，比如 / 等字元。所以如果字元串裡面包含了URI的幾個部分的話，不能用這個方法來進行編碼，否則 / 字元被編碼之後URL將顯示錯誤。不會被此方法編碼的字元：! * ( )

英文解釋：MSDN JScript Reference: The encodeURIComponent method returns an encoded URI. If you pass the result to decodeURIComponent, the original string is returned. Because the encodeURIComponent method encodes all characters, be careful if the string represents a path such as /folder1/folder2/default.html. The slash characters will be encoded and will not be valid if sent as a request to a web server. Use the encodeURI method if the string contains more than a single URI component. Mozilla Developer Core Javascript Guide： Encodes a Uniform Resource Identifier (URI) component by replacing each instance of certain characters by one, two, or three escape sequences representing the UTF-8 encoding of the character.

因此，對於中文字元串來說，如果不希望把字元串編碼格式轉化成UTF-8格式的（比如原頁面和目標頁面的charset是一致的時候），只需要使用 escape。如果你的頁面是GB2312或者其他的編碼，而接受參數的頁面是UTF-8編碼的，就要採用encodeURI或者 encodeURIComponent。

另外，encodeURI/encodeURIComponent是在javascript1.5之後引進的，escape則在javascript1.0版本就有。

英文注釋：The escape() method does not encode the + character which is interpreted as a space on the server side as well as generated by forms with spaces in their fields. Due to this shortcoming, you should avoid use of escape() whenever possible. The best alternative is usually encodeURIComponent().Use of the encodeURI() method is a bit more specialized than escape() in that it encodes for URIs [REF] as opposed to the querystring, which is part of a URL. Use this method when you need to encode a string to be used for any resource that uses URIs and needs certain characters to remain un-encoded. Note that this method does not encode the 『 character, as it is a valid character within URIs.Lastly, the encodeURIComponent() method should be used in most cases when encoding a single component of a URI. This method will encode certain chars that would normally be recognized as special chars for URIs so that many components may be included. Note that this method does not encode the 『 character, as it is a valid character within URIs.

三、jsp、servlet的解決方案 在Servlet中，一般有參數傳遞的話，會設置頁面接收參數和傳遞參數的編碼。即下面兩句： request.setCharacterEncoding("UTF-8");

response.setCharacterEncoding("utf-8"); 一般情況下，大部分都會想到使用這個，但是這兩句代碼的位置有時卻容易被忽視。正確的寫法是，request.setCharacterEncoding("UTF-8"); response.setCharacterEncoding("utf-8");要放在 PrintWriter out = response.getWriter();的後面。因為out對象初始化之後，再設置編碼已經沒有任何意義了！所以必須在out對象初始化之前進行編碼的設置。