標籤:

Spark SQL到底支持什麼SQL語句?

哪裡有Spark支持的SQL語句的格式說明,實在找不到了,官網拿個select一筆帶過,明明還有case等很複雜的功能,相應信息在哪裡呢?


當前版本的Spark SQL的SQL parser是在Presto的parser的基礎之上用ANTLRv4寫的,其語法文件在這裡:sql/catalyst/parser/SqlBase.g4,內容非常清楚了 ^_^

啊話說Databricks Runtime版Spark中有些有趣的新功能,例如說 Working with Nested Data Using Higher Order Functions in SQL on Databricks - The Databricks Blog


試試看看sparksqlcatalystsrcmainscalaorgapachesparksqlcatalystSQLParser.scala

scala語言不是很容易懂,但是裡面有解析SQL的方法,可以看出支持的SQL語句,至少關鍵詞是很明確的。

protected val ALL = Keyword("ALL")
protected val AND = Keyword("AND")
protected val APPROXIMATE = Keyword("APPROXIMATE")
protected val AS = Keyword("AS")
protected val ASC = Keyword("ASC")
protected val BETWEEN = Keyword("BETWEEN")
protected val BY = Keyword("BY")
protected val CASE = Keyword("CASE")
protected val CAST = Keyword("CAST")
protected val DESC = Keyword("DESC")
protected val DISTINCT = Keyword("DISTINCT")
protected val ELSE = Keyword("ELSE")
protected val END = Keyword("END")
protected val EXCEPT = Keyword("EXCEPT")
protected val FALSE = Keyword("FALSE")
protected val FROM = Keyword("FROM")
protected val FULL = Keyword("FULL")
protected val GROUP = Keyword("GROUP")
protected val HAVING = Keyword("HAVING")
protected val IN = Keyword("IN")
protected val INNER = Keyword("INNER")
protected val INSERT = Keyword("INSERT")
protected val INTERSECT = Keyword("INTERSECT")
protected val INTO = Keyword("INTO")
protected val IS = Keyword("IS")
protected val JOIN = Keyword("JOIN")
protected val LEFT = Keyword("LEFT")
protected val LIKE = Keyword("LIKE")
protected val LIMIT = Keyword("LIMIT")
protected val NOT = Keyword("NOT")
protected val NULL = Keyword("NULL")
protected val ON = Keyword("ON")
protected val OR = Keyword("OR")
protected val ORDER = Keyword("ORDER")
protected val SORT = Keyword("SORT")
protected val OUTER = Keyword("OUTER")
protected val OVERWRITE = Keyword("OVERWRITE")
protected val REGEXP = Keyword("REGEXP")
protected val RIGHT = Keyword("RIGHT")
protected val RLIKE = Keyword("RLIKE")
protected val SELECT = Keyword("SELECT")
protected val SEMI = Keyword("SEMI")
protected val TABLE = Keyword("TABLE")
protected val THEN = Keyword("THEN")
protected val TRUE = Keyword("TRUE")
protected val UNION = Keyword("UNION")
protected val WHEN = Keyword("WHEN")
protected val WHERE = Keyword("WHERE")
protected val WITH = Keyword("WITH")


SparkSQL兼容Hive,Hive支持的他都支持

下面的是Spark官方標記支持的:

Supported Hive FeaturesSpark SQL and DataFrames

Spark SQL supports the vast majority of Hive features, such as:

  • Hive query statements, including:
    • SELECT
    • GROUP BY
    • ORDER BY
    • CLUSTER BY
    • SORT BY
  • All Hive operators, including:
    • Relational operators (=, ?, ==, &<&>, &<, &>, &>=, &<=, etc)
    • Arithmetic operators (+, -, *, /, %, etc)
    • Logical operators (AND, , OR, ||, etc)
    • Complex type constructors
    • Mathematical functions (sign, ln, cos, etc)
    • String functions (instr, length, printf, etc)
  • User defined functions (UDF)
  • User defined aggregation functions (UDAF)
  • User defined serialization formats (SerDes)
  • Window functions
  • Joins
    • JOIN
    • {LEFT|RIGHT|FULL} OUTER JOIN
    • LEFT SEMI JOIN
    • CROSS JOIN
  • Unions
  • Sub-queries
    • SELECT col FROM ( SELECT a + b AS col from t1) t2
  • Sampling
  • Explain
  • Partitioned tables including dynamic partition insertion
  • View
  • All Hive DDL Functions, including:
    • CREATE TABLE
    • CREATE TABLE AS SELECT
    • ALTER TABLE
  • Most Hive Data types, including:
    • TINYINT
    • SMALLINT
    • INT
    • BIGINT
    • BOOLEAN
    • FLOAT
    • DOUBLE
    • STRING
    • BINARY
    • TIMESTAMP
    • DATE
    • ARRAY&<&>
    • MAP&<&>
    • STRUCT&<&>


Spark 1.4 支持大部分常用的Hive SQL,你完全可以試著寫Hive,然後看Spark支不支持。

LanguageManual

Spark SQL and DataFrames


你可以看下SQLParser.scala( ????? )


推薦閱讀:

1.5版本的Spark自己來管理內存而不是使用JVM,這不使用JVM而是自己來管理內存是咋回事?
深度剖析Spark分散式執行原理
Scala 在大數據處理方面有何優勢?
Spark比Hadoop的優勢有這麼大嗎?
怎樣理解spark中的partition和block的關係?

TAG:Spark |