scala 庫是如何做到並行的？

01-23

比如 val x = List(1, 2, 3, 4, ...)
val y = x filter (_ % 2 == 0)
val z = x map (_ * 2)

據說這些都是並行計算的，但那是如何做到的呢？

「據說」是錯的。

Scala的collection分為兩類，比如對Seq，一類是普通的Seq，名字也叫Seq；

一類是使用並行計算的Seq，名字叫ParSeq。其內部使用了Java的Fork/Join框架來做並行計算。

這兩種Seq都繼承自GenSeq，因此它們大部分方法都是一致的。

更詳細的繼承關係如下

比如List, Vector, Range都繼承自Seq，因此它們的方法都非並行的，但是這些Seq有一個par方法可以將期轉換為相應的ParSeq，ParSeq上的方法大多是並行的。

def par: ParSeq[A]

Returns a parallel implementation of this collection.
For most collection types, this method creates a new parallel collection by copying all the elements. For these collection, par takes linear time. Mutable collections in this category do not produce a mutable parallel collection that has the same underlying dataset, so changes in one collection will not be reflected in the other one.
Specific collections (e.g. ParArray or mutable.ParHashMap) override this default behaviour by creating a parallel collection which shares the same underlying dataset. For these collections, par takes constant or sublinear time.

All parallel collections return a reference to themselves.
returns
a parallel implementation of this collection

舉例為證：

$ scala Welcome to Scala version 2.10.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_45). Type in expressions to have them evaluated. Type :help for more information.


scala&> def time(cnt: Int)(call : =&> Unit): Long = {

     |   val start = System.currentTimeMillis

     |   (1 to cnt) foreach (_ =&> call)

     |   System.currentTimeMillis - start

     | }

time: (cnt: Int)(call: =&> Unit)Long
scala&> def timeConsumedMap(x: Int): Int = {

     |   Thread.sleep(1)

     |   x*2

     | }

timeConsumedMap: (x: Int)Int
scala&> val xs = List.range(0, 1000)

xs: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
scala&> time(10){ xs map timeConsumedMap }

res0: Long = 10764

scala&> time(10){ xs.par map timeConsumedMap } res1: Long = 791

參考資料：http://docs.scala-lang.org/overviews/parallel-collections/overview.html