Oct 5, 2015

Setting up Spark single node with local disk

Setting up Spark single node with local disk

OS: Ubuntu 14.04 in GENI

Please use ssh key to login Ubuntu 14.04 in GENI

If you are /bin/bash user, you might want change your shell, you could use
$sudo   chsh   -s    /bin/bash  YourUserName

Use curl command to get auto install shell script, save filename as Install.sh

$ curl   https://dl.dropboxusercontent.com/u/12787647/iCAIR/Ubuntu1404SparkInstall.sh    >   Install.sh

Use sh to run the shell script.  It will auto install spark and other packages.
$ sh  Install.sh

After install Spark in your system, you could run spark interactive shell with follow command   
$ ~/spark/bin/spark-shell


You can check the SparkUI in http://localhost:4040 like

螢幕快照 2015-08-07 下午2.59.46.png



Basic  command practice
scala> val sakanaFile = sc.textFile("README.md")
sakanaFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at :21
scala> sakanaFile.count()
res0: Long = 98
scala> val linesWithSpark = sakanaFile.filter(line => line.contains("Spark"))
linesWithSpark: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at filter at :23
scala> linesWithSpark.count()
res1: Long = 19
scala> linesWithSpark.collect()

scala> linesWithSpark.collect.foreach(println)

No comments:

Post a Comment