採用Graphviz繪製Stanford CoreNLP Parser的成分分析結果
在constituent parsing & dependency parsing中介紹了constituent parsing(成分分析),並在Using the Stanford CoreNLP API中介紹了如何採用Stanford CoreNLP Parser來得到成分分析的結果。但是得到的結果不太直觀,在調試或者進一步開發利用過程中很難進行直觀的思考,因此採用GraphViz來進行成分分析結果的可視化。
對於Stanford CoreNLP Parser解析到的結果,我們列印如下
(ROOT (IP (PP (P 在) (NP (DNP (NP (NN 秋天)) (DEC 的)) (NP (NN 時候)))) (PU ,) (NP (NN 陶喆)) (VP (ADVP (AD 很)) (VP (VV 愛吃) (NP (NN 蘋果))))))
這個結果並不太直觀,我們採用GraphViz來進行可視化,具體步驟如下所示:
- 首先對分析結果進行轉化,生成統一的由GraphVizNode組成的樹,為每個分析結果都寫一個轉化函數
- 將生成的樹進行遍歷,並按照dot的語法生成.dot文件
- 用dot生成相應的.png文件
1. GraphVizNode
我們用graphVizNode組成的樹來表示成分分析結果
public class GraphVizNode { public String value; public List<GraphVizNode> children; public List<String> relations; public int index; private static int indexer = 1; public GraphVizNode(String value) { this.value = value; this.children = new ArrayList<GraphVizNode>(); this.relations = new ArrayList<String>(); this.index = indexer++; } public void addChildren(GraphVizNode child, String relation) { this.children.add(child); this.relations.add(relation); } @Override public String toString() { return this.value; } /** * 遞歸遍歷圖來獲取所有的點和邊 * * @param node * @param relations * @param nodes */ private static void iter(GraphVizNode node, List<String> relations, Set<GraphVizNode> nodes) { nodes.add(node); /** * 添加關係語句 */ for (int i = 0; i < node.children.size(); i++) { GraphVizNode child = node.children.get(i); relations.add( "a" + node.index + " -> a" + child.index + "[label="" + node.relations.get(i) + ""]" + ";
"); } /** * 遞歸 */ for (int i = 0; i < node.children.size(); i++) iter(node.children.get(i), relations, nodes); } public static String toVizString(GraphVizNode node) { StringBuilder stringBuilder = new StringBuilder(); List<String> relations = new ArrayList<String>(); Set<GraphVizNode> nodes = new HashSet<GraphVizNode>(); iter(node, relations, nodes); stringBuilder.append("digraph G{
"); for (String str : relations) { stringBuilder.append(str); } for (GraphVizNode n : nodes) { stringBuilder.append("a" + n.index + "[label="" + n.value + ""] ;
"); } stringBuilder.append("}"); return stringBuilder.toString(); }}
2. Converter
這一步用來將Stanford CoreNLP Parser產生的Tree轉化
public GraphVizNode treeToNode(Tree tree) { GraphVizNode root = new GraphVizNode(tree.label().toString()); iter(root, tree); return root; } public void iter(GraphVizNode node, Tree tree) { for (Tree child : tree.children()) { GraphVizNode childNode = new GraphVizNode(child.label().toString()); node.addChildren(childNode, ""); iter(childNode, child); } }
3. 樣例
需要分析的語句為:
在秋天的時候,陶喆很愛吃蘋果
Stanford CoreNLP Parser列印到的結果為
(ROOT (IP (PP (P 在) (NP (DNP (NP (NN 秋天)) (DEC 的)) (NP (NN 時候)))) (PU ,) (NP (NN 陶喆)) (VP (ADVP (AD 很)) (VP (VV 愛吃) (NP (NN 蘋果))))))
生成的.dot文件為
digraph G{a1 -> a2[label=""];a2 -> a3[label=""];a2 -> a16[label=""];a2 -> a18[label=""];a2 -> a21[label=""];a3 -> a4[label=""];a3 -> a6[label=""];a4 -> a5[label=""];a6 -> a7[label=""];a6 -> a13[label=""];a7 -> a8[label=""];a7 -> a11[label=""];a8 -> a9[label=""];a9 -> a10[label=""];a11 -> a12[label=""];a13 -> a14[label=""];a14 -> a15[label=""];a16 -> a17[label=""];a18 -> a19[label=""];a19 -> a20[label=""];a21 -> a22[label=""];a21 -> a25[label=""];a22 -> a23[label=""];a23 -> a24[label=""];a25 -> a26[label=""];a25 -> a28[label=""];a26 -> a27[label=""];a28 -> a29[label=""];a29 -> a30[label=""];a16[label="PU-5"] ;a9[label="NN-2"] ;a29[label="NN-9"] ;a8[label="NP"] ;a19[label="NN-6"] ;a1[label="ROOT"] ;a11[label="DEC-3"] ;a12[label="的-3"] ;a13[label="NP"] ;a26[label="VV-8"] ;a2[label="IP"] ;a20[label="陶喆-6"] ;a21[label="VP"] ;a14[label="NN-4"] ;a15[label="時候-4"] ;a24[label="很-7"] ;a22[label="ADVP"] ;a5[label="在-1"] ;a27[label="愛吃-8"] ;a3[label="PP"] ;a7[label="DNP"] ;a18[label="NP"] ;a10[label="秋天-2"] ;a25[label="VP"] ;a4[label="P-1"] ;a6[label="NP"] ;a17[label=",-5"] ;a23[label="AD-7"] ;a30[label="蘋果-9"] ;a28[label="NP"] ;}
生成的圖片為
我們生成一個比較複雜的長句子
推薦閱讀:
※一篇通俗易懂的word2vec
※【deeplearning.ai】深度學習(9):自然語言處理
※中文編碼問題
※大牛之作< Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks>