深入研究Clang(五) Clang Lexer代碼閱讀筆記之Lexer

Clang的Lexer(詞法分析器)的源碼的主要位置如下:

clang/lib/Lex 這裡是主要的Lexer的代碼;

clang/include/clang/Lex 這裡是Lexer的頭文件代碼的位置;

同時,Lexer還使用了clangBasic庫,所以要分析Lexer的代碼,clangBasic(clang/lib/Basic)的一些代碼也會用到。

首先從Lexer入手。

clang/include/clang/Lex/Lexer.h

clang::Lexer:

  1. 00057 //===--------------------------------------------------------------------===//
  2. 00058 // Context-specific lexing flags set by the preprocessor.
  3. 00059 //
  4. 00060
  5. 00061 /// ExtendedTokenMode - The lexer can optionally keep comments and whitespace
  6. 00062 /// and return them as tokens. This is used for -C and -CC modes, and
  7. 00063 /// whitespace preservation can be useful for some clients that want to lex
  8. 00064 /// the file in raw mode and get every character from the file.
  9. 00065 ///
  10. 00066 /// When this is set to 2 it returns comments and whitespace. When set to 1
  11. 00067 /// it returns comments, when it is set to 0 it returns normal tokens only.
  12. 00068 unsigned char ExtendedTokenMode;
  13. 00069
  14. 00070 //===--------------------------------------------------------------------===//

這個成員變數保存詞法分析的一個狀態,根據它的值的不同:0、1、2,分別對應只返回正常的token,返回comments

和正常的token,返回空格、comments和正常的token。

下面是幾個操作這個成員變數的函數,基本上都是獲取值、設置值和重設值。代碼不複雜,

  1. 00162 /// isKeepWhitespaceMode - Return true if the lexer should return tokens for
  2. 00163 /// every character in the file, including whitespace and comments. This
  3. 00164 /// should only be used in raw mode, as the preprocessor is not prepared to
  4. 00165 /// deal with the excess tokens.
  5. 00166 bool isKeepWhitespaceMode() const {
  6. 00167 return ExtendedTokenMode > 1;
  7. 00168 }
  8. 00169
  9. 00170 /// SetKeepWhitespaceMode - This method lets clients enable or disable
  10. 00171 /// whitespace retention mode.
  11. 00172 void SetKeepWhitespaceMode(bool Val) {
  12. 00173 assert((!Val || LexingRawMode || LangOpts.TraditionalCPP) &&
  13. 00174 "Can only retain whitespace in raw mode or -traditional-cpp");
  14. 00175 ExtendedTokenMode = Val ? 2 : 0;
  15. 00176 }
  16. 00177
  17. 00178 /// inKeepCommentMode - Return true if the lexer should return comments as
  18. 00179 /// tokens.
  19. 00180 bool inKeepCommentMode() const {
  20. 00181 return ExtendedTokenMode > 0;
  21. 00182 }
  22. 00183
  23. 00184 /// SetCommentRetentionMode - Change the comment retention mode of the lexer
  24. 00185 /// to the specified mode. This is really only useful when lexing in raw
  25. 00186 /// mode, because otherwise the lexer needs to manage this.
  26. 00187 void SetCommentRetentionState(bool Mode) {
  27. 00188 assert(!isKeepWhitespaceMode() &&
  28. 00189 "Cant play with comment retention state when retaining whitespace");
  29. 00190 ExtendedTokenMode = Mode ? 1 : 0;
  30. 00191 }
  31. 00192
  32. 00193 /// Sets the extended token mode back to its initial value, according to the
  33. 00194 /// language options and preprocessor. This controls whether the lexer
  34. 00195 /// produces comment and whitespace tokens.
  35. 00196 ///
  36. 00197 /// This requires the lexer to have an associated preprocessor. A standalone
  37. 00198 /// lexer has nothing to reset to.
  38. 00199 void resetExtendedTokenMode();

關於raw mode:

raw mode的時候,ExtendedTokenMode = 2,Lexer會輸出包含空格、comments和正常tokens在內的所有

字元。在Lexer的父類:clang::PreprocessorLexer類中(),有一個成員變數:

  1. 00049 /// rief True if in raw mode.
  2. 00050 ///
  3. 00051 /// Raw mode disables interpretation of tokens and is a far faster mode to
  4. 00052 /// lex in than non-raw-mode. This flag:
  5. 00053 /// 1. If EOF of the current lexer is found, the include stack isnt popped.
  6. 00054 /// 2. Identifier information is not looked up for identifier tokens. As an
  7. 00055 /// effect of this, implicit macro expansion is naturally disabled.
  8. 00056 /// 3. "#" tokens at the start of a line are treated as normal tokens, not
  9. 00057 /// implicitly transformed by the lexer.
  10. 00058 /// 4. All diagnostic messages are disabled.
  11. 00059 /// 5. No callbacks are made into the preprocessor.
  12. 00060 ///
  13. 00061 /// Note that in raw mode that the PP pointer may be null.
  14. 00062 bool LexingRawMode;

它可以表明Lexer是否在raw mode下。同時,這裡的注釋也說明了raw model的作用。

從clang::Lexer的定義可以看出,它是clang::PreprocessorLexer的子類,上面raw model的部分也引用了clang::PreprocessorLexer類的代碼,下面看下clang::PreprocessorLexer的代碼。

clang/include/clang/Lex/PreprocessorLexer.h

  1. 00022 namespace clang {
  2. 00023
  3. 00024 class FileEntry;
  4. 00025 class Preprocessor;

從這裡可以看出clang::PreprocessorLexer使用了上面兩個類,而在頭文件中的具體位置就是:

[cpp] view plain copy

  1. 00027 class PreprocessorLexer {
  2. 00028 virtual void anchor();
  3. 00029 protected:
  4. 00030 Preprocessor *PP; // Preprocessor object controlling lexing.

以及

  1. 00164 /// getFileEntry - Return the FileEntry corresponding to this FileID. Like
  2. 00165 /// getFileID(), this only works for lexers with attached preprocessors.
  3. 00166 const FileEntry *getFileEntry() const;

從代碼中可以看出,這兩個類,一個是作為成員變數,一個是作為了一個成員函數的返回類型來使用的。我們跟蹤代碼去看下這兩個類的具體實現。這兩個類的具體實現,FileEntry較為簡單,很容易看出到底內容;而Preprocessor類較為複雜,牽涉內容較多,在這裡暫且不作分析。後續繼續分析。

2014-11-20


推薦閱讀:

LLVM每日談之六 LLVM的源碼結構
LLVM每日談之三 如何創建一個LLVM工程
Android NDK Clang遷移
為什麼Apple的Clang生成的LLVM IR比開源的Clang生成的IR要讀者友好?

TAG:LLVM | Clang | ChrisLattner |