Spark SQL到底支持什麼SQL語句?
哪裡有Spark支持的SQL語句的格式說明,實在找不到了,官網拿個select一筆帶過,明明還有case等很複雜的功能,相應信息在哪裡呢?
當前版本的Spark SQL的SQL parser是在Presto的parser的基礎之上用ANTLRv4寫的,其語法文件在這裡:sql/catalyst/parser/SqlBase.g4,內容非常清楚了 ^_^
啊話說Databricks Runtime版Spark中有些有趣的新功能,例如說 Working with Nested Data Using Higher Order Functions in SQL on Databricks - The Databricks Blog
試試看看sparksqlcatalystsrcmainscalaorgapachesparksqlcatalystSQLParser.scalascala語言不是很容易懂,但是裡面有解析SQL的方法,可以看出支持的SQL語句,至少關鍵詞是很明確的。
protected val ALL = Keyword("ALL")
protected val AND = Keyword("AND")
protected val APPROXIMATE = Keyword("APPROXIMATE")
protected val AS = Keyword("AS")
protected val ASC = Keyword("ASC")
protected val BETWEEN = Keyword("BETWEEN")
protected val BY = Keyword("BY")
protected val CASE = Keyword("CASE")
protected val CAST = Keyword("CAST")
protected val DESC = Keyword("DESC")
protected val DISTINCT = Keyword("DISTINCT")
protected val ELSE = Keyword("ELSE")
protected val END = Keyword("END")
protected val EXCEPT = Keyword("EXCEPT")
protected val FALSE = Keyword("FALSE")
protected val FROM = Keyword("FROM")
protected val FULL = Keyword("FULL")
protected val GROUP = Keyword("GROUP")
protected val HAVING = Keyword("HAVING")
protected val IN = Keyword("IN")
protected val INNER = Keyword("INNER")
protected val INSERT = Keyword("INSERT")
protected val INTERSECT = Keyword("INTERSECT")
protected val INTO = Keyword("INTO")
protected val IS = Keyword("IS")
protected val JOIN = Keyword("JOIN")
protected val LEFT = Keyword("LEFT")
protected val LIKE = Keyword("LIKE")
protected val LIMIT = Keyword("LIMIT")
protected val NOT = Keyword("NOT")
protected val NULL = Keyword("NULL")
protected val ON = Keyword("ON")
protected val OR = Keyword("OR")
protected val ORDER = Keyword("ORDER")
protected val SORT = Keyword("SORT")
protected val OUTER = Keyword("OUTER")
protected val OVERWRITE = Keyword("OVERWRITE")
protected val REGEXP = Keyword("REGEXP")
protected val RIGHT = Keyword("RIGHT")
protected val RLIKE = Keyword("RLIKE")
protected val SELECT = Keyword("SELECT")
protected val SEMI = Keyword("SEMI")
protected val TABLE = Keyword("TABLE")
protected val THEN = Keyword("THEN")
protected val TRUE = Keyword("TRUE")
protected val UNION = Keyword("UNION")
protected val WHEN = Keyword("WHEN")
protected val WHERE = Keyword("WHERE")
protected val WITH = Keyword("WITH")
SparkSQL兼容Hive,Hive支持的他都支持下面的是Spark官方標記支持的:Supported Hive FeaturesSpark SQL and DataFrames
Spark SQL supports the vast majority of Hive features, such as:
- Hive query statements, including:
- SELECT
- GROUP BY
- ORDER BY
- CLUSTER BY
- SORT BY
- All Hive operators, including:
- Relational operators (=, ?, ==, &<&>, &<, &>, &>=, &<=, etc)
- Arithmetic operators (+, -, *, /, %, etc)
- Logical operators (AND, , OR, ||, etc)
- Complex type constructors
- Mathematical functions (sign, ln, cos, etc)
- String functions (instr, length, printf, etc)
- User defined functions (UDF)
- User defined aggregation functions (UDAF)
- User defined serialization formats (SerDes)
- Window functions
- Joins
- JOIN
- {LEFT|RIGHT|FULL} OUTER JOIN
- LEFT SEMI JOIN
- CROSS JOIN
- Unions
- Sub-queries
- SELECT col FROM ( SELECT a + b AS col from t1) t2
- Sampling
- Explain
- Partitioned tables including dynamic partition insertion
- View
- All Hive DDL Functions, including:
- CREATE TABLE
- CREATE TABLE AS SELECT
- ALTER TABLE
- Most Hive Data types, including:
- TINYINT
- SMALLINT
- INT
- BIGINT
- BOOLEAN
- FLOAT
- DOUBLE
- STRING
- BINARY
- TIMESTAMP
- DATE
- ARRAY&<&>
- MAP&<&>
- STRUCT&<&>
Spark 1.4 支持大部分常用的Hive SQL,你完全可以試著寫Hive,然後看Spark支不支持。
LanguageManualSpark SQL and DataFrames你可以看下SQLParser.scala( ????? )
推薦閱讀:
※1.5版本的Spark自己來管理內存而不是使用JVM,這不使用JVM而是自己來管理內存是咋回事?
※深度剖析Spark分散式執行原理
※Scala 在大數據處理方面有何優勢?
※Spark比Hadoop的優勢有這麼大嗎?
※怎樣理解spark中的partition和block的關係?
TAG:Spark |