在看spark大數據分析的時候，輸入書中的範例代碼卻提示缺少參數，怎麼辦呢？

01-21

嗯所以題主在讀的書是Holden大大寫的《Learning Spark: Lightening Fast Data Analysis》，想跑的例子是其中的Example 2-1. Python line count

&>&>&> lines = sc.textFile("README.md") # Create an RDD called lines

&>&>&> lines.count() # Count the number of items in this RDD 127 &>&>&> lines.first() # First item in this RDD, i.e. first line of README.md u"# Apache Spark"

題主應該看到的是這樣的狀態：（這是我在自己機器上剛跑的）

$ bin/pyspark Python 2.7.13 (default, Apr 4 2017, 08:47:57) [GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)] on darwin Type "help", "copyright", "credits" or "license" for more information. Using Spark"s default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/07/03 18:31:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/07/03 18:31:25 WARN MetricsSystem: Using default name SparkStatusTracker for source because neither spark.metrics.namespace nor spark.app.id is set. Welcome to ____ __ / __/__ ___ _____/ /__ _ / _ / _ `/ __/ "_/ /__ / .__/\_,_/_/ /_/\_ version 2.3.0-SNAPSHOT /_/

Using Python version 2.7.13 (default, Apr 4 2017 08:47:57) SparkSession available as "spark". &>&>&> sc.textFile("README.md") README.md MapPartitionsRDD[1] at textFile at NativeMethodAccessorImpl.java:0 &>&>&> lines = sc.textFile("README.md") &>&>&> lines.count() 103 &>&>&> lines.first() u"# Apache Spark" &>&>&> quit()

然而題主自己的回答里給出的截圖看來在Spark Shell啟動的過程中就出錯了。請問題主是如何啟動Spark Shell的？&>_&<

玩Spark還是用原生態的scala比較爽喲！

不要import重命名成sc!

第一行import是錯誤的，你需要創建一個SparContext對象，而不是import之後重命名。

你用的是什麼環境來跑Spark的？

建議直接去spark官網查看示例，也不清楚你這邊環境安裝如何了