python3讀取未知編碼文件時無法提前判斷其編碼，如何解決？

06-11

python3打開一個未知編碼文件的時候怎麼辦？python2還可以不管三七二十一打開了之後再chardet判斷一下編碼，然後decode轉換。python3在fp.read()之前就得確定編碼，chardet根本來不及調用。
我知道文件編碼是什麼。但我源代碼想要魯棒性高一點，因為文件路徑是終端傳參進去的。
（題主2轉3中，本意逃避編碼，結果第一個難題就是編碼問題..）

謝邀。多看看文檔吧，傳個 mode=rb 試試：

As mentioned in the Overview, Python distinguishes between binary and text I/O. Files opened in binary mode (including b in themode argument) return contents as bytes objects without any decoding. In text mode (the default, or when t is included in the modeargument), the contents of the file are returned as str, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given.

rb拉進來轉之（類py2）

或者try except到死……

編碼這坑，逃不過去的……

import chardect


#以txt文件為例，其實其他的也可以

secret_encoding_file = 某未知編碼文件.txt
#為了減少內存，建議用readline只讀第一行數據。而不是read全部讀入

bytes_data = open(secret_encoding_file, rb).readline()
#返回該字元節數據使用的編碼 方式

encoding_info = chardect.dectect(bytes_data)[encoding]

#查看編碼方式 print(encoding_info)

import chardet

s=open(path,rb).read() charset=chardet.detect(s)[encoding] print(charset)

已二進位『rb』模式讀取，然後判斷編碼

Open時用rb二進位讀取就可以了