Apache SparkOkay, let's download a copy of spark to your local pc. You can download from this site.
Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, and Python, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.
extract the downloaded file and ran the command, not good.
user@localhost:~/Desktop/spark-1.4.1$ ./bin/pyspark
ls: cannot access /home/user/Desktop/spark-1.4.1/assembly/target/scala-2.10: No such file or directory
Failed to find Spark assembly in /home/user/Desktop/spark-1.4.1/assembly/target/scala-2.10.
You need to build Spark before running this program.
user@localhost:~/Desktop/spark-1.4.1$ ./bin/spark-shell
ls: cannot access /home/user/Desktop/spark-1.4.1/assembly/target/scala-2.10: No such file or directory
Failed to find Spark assembly in /home/user/Desktop/spark-1.4.1/assembly/target/scala-2.10.
You need to build Spark before running this program.
Well, the default download setting is source, so you will have to compile the source.
user@localhost:~/Desktop/spark-1.4.1$ mvn -DskipTests clean package
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] Spark Project Parent POM
[INFO] Spark Launcher Project
[INFO] Spark Project Networking
[INFO] Spark Project Shuffle Streaming Service
[INFO] Spark Project Unsafe
...
...
...
constituent[20]: file:/usr/share/maven/lib/wagon-http-shaded.jar
constituent[21]: file:/usr/share/maven/lib/maven-settings-builder-3.x.jar
constituent[22]: file:/usr/share/maven/lib/maven-aether-provider-3.x.jar
constituent[23]: file:/usr/share/maven/lib/maven-core-3.x.jar
constituent[24]: file:/usr/share/maven/lib/plexus-cipher.jar
constituent[25]: file:/usr/share/maven/lib/aether-util.jar
constituent[26]: file:/usr/share/maven/lib/commons-httpclient.jar
constituent[27]: file:/usr/share/maven/lib/commons-cli.jar
constituent[28]: file:/usr/share/maven/lib/aether-api.jar
constituent[29]: file:/usr/share/maven/lib/maven-model-3.x.jar
constituent[30]: file:/usr/share/maven/lib/guava.jar
constituent[31]: file:/usr/share/maven/lib/wagon-file.jar
---------------------------------------------------
Exception in thread "main" java.lang.OutOfMemoryError: PermGen space
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClassFromSelf(ClassRealm.java:401)
at org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy.loadClass(SelfFirstStrategy.java:42)
at org.codehaus.plexus.classworlds.realm.ClassRealm.unsynchronizedLoadClass(ClassRealm.java:271)
at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClass(ClassRealm.java:247)
at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClass(ClassRealm.java:239)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:545)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
okay, let's beef up a little for the build setting, and the build took very long time, eventually. I switch to build in the directory build. See below.
user@localhost:~/Desktop/spark-1.4.1$ export MAVEN_OPTS="-XX:MaxPermSize=1024M"
user@localhost:~/Desktop/spark-1.4.1$ mvn -DskipTests clean package
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] Spark Project Parent POM
[INFO] Spark Launcher Project
[INFO] Spark Project Networking
[INFO] Spark Project Shuffle Streaming Service
[INFO] Spark Project Unsafe
[INFO] Spark Project Core
user@localhost:~/Desktop/spark-1.4.1$ build/mvn -DskipTests clean package
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] Spark Project Parent POM
[INFO] Spark Launcher Project
[INFO] Spark Project Networking
[INFO] Spark Project Shuffle Streaming Service
[INFO] Spark Project Unsafe
[INFO] Spark Project Core
..
...
...
...
get/spark-streaming-kafka-assembly_2.10-1.4.1-shaded.jar
[INFO]
[INFO] --- maven-source-plugin:2.4:jar-no-fork (create-source-jar) @ spark-streaming-kafka-assembly_2.10 ---
[INFO] Building jar: /home/user/Desktop/spark-1.4.1/external/kafka-assembly/target/spark-streaming-kafka-assembly_2.10-1.4.1-sources.jar
[INFO]
[INFO] --- maven-source-plugin:2.4:test-jar-no-fork (create-source-jar) @ spark-streaming-kafka-assembly_2.10 ---
[INFO] Building jar: /home/user/Desktop/spark-1.4.1/external/kafka-assembly/target/spark-streaming-kafka-assembly_2.10-1.4.1-test-sources.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM .......................... SUCCESS [26.138s]
[INFO] Spark Launcher Project ............................ SUCCESS [1:15.976s]
[INFO] Spark Project Networking .......................... SUCCESS [26.347s]
[INFO] Spark Project Shuffle Streaming Service ........... SUCCESS [14.123s]
[INFO] Spark Project Unsafe .............................. SUCCESS [12.643s]
[INFO] Spark Project Core ................................ SUCCESS [9:49.622s]
[INFO] Spark Project Bagel ............................... SUCCESS [17.426s]
[INFO] Spark Project GraphX .............................. SUCCESS [53.601s]
[INFO] Spark Project Streaming ........................... SUCCESS [1:34.290s]
[INFO] Spark Project Catalyst ............................ SUCCESS [2:04.020s]
[INFO] Spark Project SQL ................................. SUCCESS [2:11.032s]
[INFO] Spark Project ML Library .......................... SUCCESS [2:57.880s]
[INFO] Spark Project Tools ............................... SUCCESS [6.920s]
[INFO] Spark Project Hive ................................ SUCCESS [2:58.649s]
[INFO] Spark Project REPL ................................ SUCCESS [36.564s]
[INFO] Spark Project Assembly ............................ SUCCESS [3:13.152s]
[INFO] Spark Project External Twitter .................... SUCCESS [1:09.316s]
[INFO] Spark Project External Flume Sink ................. SUCCESS [42.294s]
[INFO] Spark Project External Flume ...................... SUCCESS [37.907s]
[INFO] Spark Project External MQTT ....................... SUCCESS [1:20.999s]
[INFO] Spark Project External ZeroMQ ..................... SUCCESS [29.090s]
[INFO] Spark Project External Kafka ...................... SUCCESS [54.212s]
[INFO] Spark Project Examples ............................ SUCCESS [5:54.508s]
[INFO] Spark Project External Kafka Assembly ............. SUCCESS [1:24.962s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 41:53.884s
[INFO] Finished at: Tue Aug 04 08:56:02 MYT 2015
[INFO] Final Memory: 71M/684M
[INFO] ------------------------------------------------------------------------
Yes, finally the build is success. Even though success, as you can see above, it took 41minutes on my pc just to compile. Okay, now that all libs are built, let's repeat the command we type just now.
$ ./bin/spark-shell
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/08/04 20:21:16 INFO SecurityManager: Changing view acls to: user
15/08/04 20:21:16 INFO SecurityManager: Changing modify acls to: user
15/08/04 20:21:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); users with modify permissions: Set(user)
15/08/04 20:21:16 INFO HttpServer: Starting HTTP Server
15/08/04 20:21:17 INFO Utils: Successfully started service 'HTTP class server' on port 56379.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.4.1
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55)
Type in expressions to have them evaluated.
Type :help for more information.
15/08/04 20:21:24 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.1.1; using 192.168.133.28 instead (on interface eth0)
15/08/04 20:21:24 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/08/04 20:21:24 INFO SparkContext: Running Spark version 1.4.1
15/08/04 20:21:24 INFO SecurityManager: Changing view acls to: user
15/08/04 20:21:24 INFO SecurityManager: Changing modify acls to: user
15/08/04 20:21:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); users with modify permissions: Set(user)
15/08/04 20:21:25 INFO Slf4jLogger: Slf4jLogger started
15/08/04 20:21:26 INFO Remoting: Starting remoting
15/08/04 20:21:26 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.133.28:47888]
15/08/04 20:21:26 INFO Utils: Successfully started service 'sparkDriver' on port 47888.
15/08/04 20:21:27 INFO SparkEnv: Registering MapOutputTracker
15/08/04 20:21:27 INFO SparkEnv: Registering BlockManagerMaster
15/08/04 20:21:27 INFO DiskBlockManager: Created local directory at /tmp/spark-660b5f39-26be-4ea2-8593-c0c05a093a23/blockmgr-c3225f03-5ecf-4fed-bbe4-df2331ac7742
15/08/04 20:21:27 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/08/04 20:21:27 INFO HttpFileServer: HTTP File server directory is /tmp/spark-660b5f39-26be-4ea2-8593-c0c05a093a23/httpd-3ab40971-a6d0-42a7-b39e-4d1ce4290642
15/08/04 20:21:27 INFO HttpServer: Starting HTTP Server
15/08/04 20:21:27 INFO Utils: Successfully started service 'HTTP file server' on port 50089.
15/08/04 20:21:27 INFO SparkEnv: Registering OutputCommitCoordinator
15/08/04 20:21:28 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/08/04 20:21:28 INFO SparkUI: Started SparkUI at http://192.168.133.28:4040
15/08/04 20:21:28 INFO Executor: Starting executor ID driver on host localhost
15/08/04 20:21:28 INFO Executor: Using REPL class URI: http://192.168.133.28:56379
15/08/04 20:21:28 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36428.
15/08/04 20:21:28 INFO NettyBlockTransferService: Server created on 36428
15/08/04 20:21:28 INFO BlockManagerMaster: Trying to register BlockManager
15/08/04 20:21:28 INFO BlockManagerMasterEndpoint: Registering block manager localhost:36428 with 265.4 MB RAM, BlockManagerId(driver, localhost, 36428)
15/08/04 20:21:28 INFO BlockManagerMaster: Registered BlockManager
15/08/04 20:21:29 INFO SparkILoop: Created spark context..
Spark context available as sc.
15/08/04 20:21:30 INFO SparkILoop: Created sql context..
SQL context available as sqlContext.
scala>
Okay, everything looks good, the error above no longer exists. Let's explore further.
scala> sc.parallelize(1 to 1000).count()
15/08/04 20:30:05 INFO SparkContext: Starting job: count at <console>:22
15/08/04 20:30:05 INFO DAGScheduler: Got job 0 (count at <console>:22) with 4 output partitions (allowLocal=false)
15/08/04 20:30:05 INFO DAGScheduler: Final stage: ResultStage 0(count at <console>:22)
15/08/04 20:30:05 INFO DAGScheduler: Parents of final stage: List()
15/08/04 20:30:05 INFO DAGScheduler: Missing parents: List()
15/08/04 20:30:05 INFO DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at <console>:22), which has no missing parents
15/08/04 20:30:05 INFO MemoryStore: ensureFreeSpace(1096) called with curMem=0, maxMem=278302556
15/08/04 20:30:05 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1096.0 B, free 265.4 MB)
15/08/04 20:30:05 INFO MemoryStore: ensureFreeSpace(804) called with curMem=1096, maxMem=278302556
15/08/04 20:30:05 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 804.0 B, free 265.4 MB)
15/08/04 20:30:05 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:36428 (size: 804.0 B, free: 265.4 MB)
15/08/04 20:30:05 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:874
15/08/04 20:30:05 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (ParallelCollectionRDD[0] at parallelize at <console>:22)
15/08/04 20:30:05 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
15/08/04 20:30:05 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1369 bytes)
15/08/04 20:30:05 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1369 bytes)
15/08/04 20:30:05 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, PROCESS_LOCAL, 1369 bytes)
15/08/04 20:30:05 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, PROCESS_LOCAL, 1426 bytes)
15/08/04 20:30:05 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
15/08/04 20:30:05 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/08/04 20:30:05 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
15/08/04 20:30:05 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
15/08/04 20:30:06 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 658 bytes result sent to driver
15/08/04 20:30:06 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 658 bytes result sent to driver
15/08/04 20:30:06 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 658 bytes result sent to driver
15/08/04 20:30:06 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 658 bytes result sent to driver
15/08/04 20:30:06 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 477 ms on localhost (1/4)
15/08/04 20:30:06 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 478 ms on localhost (2/4)
15/08/04 20:30:06 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 508 ms on localhost (3/4)
15/08/04 20:30:06 INFO DAGScheduler: ResultStage 0 (count at <console>:22) finished in 0.520 s
15/08/04 20:30:06 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 478 ms on localhost (4/4)
15/08/04 20:30:06 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/08/04 20:30:06 INFO DAGScheduler: Job 0 finished: count at <console>:22, took 1.079304 s
res0: Long = 1000
That's pretty nice, for a small demo on how is spark work. Now move on to the next example, let's open another terminal.
user@localhost:~/Desktop/spark-1.4.1$ ./bin/pyspark
Python 2.7.10 (default, Jul 1 2015, 10:54:53)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/08/04 20:37:42 INFO SparkContext: Running Spark version 1.4.1
15/08/04 20:37:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/08/04 20:37:44 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.1.1; using 182.168.133.28 instead (on interface eth0)
15/08/04 20:37:44 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/08/04 20:37:44 INFO SecurityManager: Changing view acls to: user
15/08/04 20:37:44 INFO SecurityManager: Changing modify acls to: user
15/08/04 20:37:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); users with modify permissions: Set(user)
15/08/04 20:37:46 INFO Slf4jLogger: Slf4jLogger started
15/08/04 20:37:46 INFO Remoting: Starting remoting
15/08/04 20:37:46 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@182.168.133.28:35904]
15/08/04 20:37:46 INFO Utils: Successfully started service 'sparkDriver' on port 35904.
15/08/04 20:37:46 INFO SparkEnv: Registering MapOutputTracker
15/08/04 20:37:46 INFO SparkEnv: Registering BlockManagerMaster
15/08/04 20:37:47 INFO DiskBlockManager: Created local directory at /tmp/spark-2b46e9e7-1779-45d1-b9cf-46000baf7d9b/blockmgr-e2f47b34-47a8-4b72-a0d6-25d0a7daa02e
15/08/04 20:37:47 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/08/04 20:37:47 INFO HttpFileServer: HTTP File server directory is /tmp/spark-2b46e9e7-1779-45d1-b9cf-46000baf7d9b/httpd-2ec128c2-bad0-4dd9-a826-eab2ee0779cb
15/08/04 20:37:47 INFO HttpServer: Starting HTTP Server
15/08/04 20:37:47 INFO Utils: Successfully started service 'HTTP file server' on port 45429.
15/08/04 20:37:47 INFO SparkEnv: Registering OutputCommitCoordinator
15/08/04 20:37:49 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
15/08/04 20:37:50 INFO Utils: Successfully started service 'SparkUI' on port 4041.
15/08/04 20:37:50 INFO SparkUI: Started SparkUI at http://182.168.133.28:4041
15/08/04 20:37:50 INFO Executor: Starting executor ID driver on host localhost
15/08/04 20:37:51 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 47045.
15/08/04 20:37:51 INFO NettyBlockTransferService: Server created on 47045
15/08/04 20:37:51 INFO BlockManagerMaster: Trying to register BlockManager
15/08/04 20:37:51 INFO BlockManagerMasterEndpoint: Registering block manager localhost:47045 with 265.4 MB RAM, BlockManagerId(driver, localhost, 47045)
15/08/04 20:37:51 INFO BlockManagerMaster: Registered BlockManager
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.4.1
/_/
Using Python version 2.7.10 (default, Jul 1 2015 10:54:53)
SparkContext available as sc, SQLContext available as sqlContext.
>>> sc.parallelize(range(1000)).count()
15/08/04 20:37:55 INFO SparkContext: Starting job: count at <stdin>:1
15/08/04 20:37:55 INFO DAGScheduler: Got job 0 (count at <stdin>:1) with 4 output partitions (allowLocal=false)
15/08/04 20:37:55 INFO DAGScheduler: Final stage: ResultStage 0(count at <stdin>:1)
15/08/04 20:37:55 INFO DAGScheduler: Parents of final stage: List()
15/08/04 20:37:55 INFO DAGScheduler: Missing parents: List()
15/08/04 20:37:55 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at count at <stdin>:1), which has no missing parents
15/08/04 20:37:55 INFO MemoryStore: ensureFreeSpace(4416) called with curMem=0, maxMem=278302556
15/08/04 20:37:55 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.3 KB, free 265.4 MB)
15/08/04 20:37:55 INFO MemoryStore: ensureFreeSpace(2722) called with curMem=4416, maxMem=278302556
15/08/04 20:37:55 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.7 KB, free 265.4 MB)
15/08/04 20:37:55 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:47045 (size: 2.7 KB, free: 265.4 MB)
15/08/04 20:37:55 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:874
15/08/04 20:37:55 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (PythonRDD[1] at count at <stdin>:1)
15/08/04 20:37:55 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
15/08/04 20:37:55 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1873 bytes)
15/08/04 20:37:55 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 2117 bytes)
15/08/04 20:37:55 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, PROCESS_LOCAL, 2123 bytes)
15/08/04 20:37:55 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, PROCESS_LOCAL, 2123 bytes)
15/08/04 20:37:55 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
15/08/04 20:37:55 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
15/08/04 20:37:55 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/08/04 20:37:55 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
15/08/04 20:37:56 INFO PythonRDD: Times: total = 421, boot = 376, init = 44, finish = 1
15/08/04 20:37:56 INFO PythonRDD: Times: total = 418, boot = 354, init = 64, finish = 0
15/08/04 20:37:56 INFO PythonRDD: Times: total = 423, boot = 372, init = 51, finish = 0
15/08/04 20:37:56 INFO PythonRDD: Times: total = 421, boot = 381, init = 40, finish = 0
15/08/04 20:37:56 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 698 bytes result sent to driver
15/08/04 20:37:56 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 698 bytes result sent to driver
15/08/04 20:37:56 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 698 bytes result sent to driver
15/08/04 20:37:56 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 698 bytes result sent to driver
15/08/04 20:37:56 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 552 ms on localhost (1/4)
15/08/04 20:37:56 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 560 ms on localhost (2/4)
15/08/04 20:37:56 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 562 ms on localhost (3/4)
15/08/04 20:37:56 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 626 ms on localhost (4/4)
15/08/04 20:37:56 INFO DAGScheduler: ResultStage 0 (count at <stdin>:1) finished in 0.641 s
15/08/04 20:37:56 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/08/04 20:37:56 INFO DAGScheduler: Job 0 finished: count at <stdin>:1, took 1.137405 s
1000
>>>
Looks good, next example will calculate pi using spark.
user@localhost:~/Desktop/spark-1.4.1$ ./bin/run-example SparkPi
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/08/04 20:44:50 INFO SparkContext: Running Spark version 1.4.1
15/08/04 20:44:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/08/04 20:44:51 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.1.1; using 182.168.133.28 instead (on interface eth0)
15/08/04 20:44:51 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/08/04 20:44:51 INFO SecurityManager: Changing view acls to: user
15/08/04 20:44:51 INFO SecurityManager: Changing modify acls to: user
15/08/04 20:44:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); users with modify permissions: Set(user)
15/08/04 20:44:52 INFO Slf4jLogger: Slf4jLogger started
15/08/04 20:44:52 INFO Remoting: Starting remoting
15/08/04 20:44:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@182.168.133.28:45817]
15/08/04 20:44:53 INFO Utils: Successfully started service 'sparkDriver' on port 45817.
15/08/04 20:44:53 INFO SparkEnv: Registering MapOutputTracker
15/08/04 20:44:53 INFO SparkEnv: Registering BlockManagerMaster
15/08/04 20:44:53 INFO DiskBlockManager: Created local directory at /tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9/blockmgr-5ed813af-a26f-413c-bdfc-1e08001f9cb2
15/08/04 20:44:53 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/08/04 20:44:53 INFO HttpFileServer: HTTP File server directory is /tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9/httpd-f07ff755-e34d-4149-b4ac-399e6897221a
15/08/04 20:44:53 INFO HttpServer: Starting HTTP Server
15/08/04 20:44:53 INFO Utils: Successfully started service 'HTTP file server' on port 50955.
15/08/04 20:44:53 INFO SparkEnv: Registering OutputCommitCoordinator
15/08/04 20:44:54 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/08/04 20:44:54 INFO SparkUI: Started SparkUI at http://182.168.133.28:4040
15/08/04 20:44:58 INFO SparkContext: Added JAR file:/home/user/Desktop/spark-1.4.1/examples/target/scala-2.10/spark-examples-1.4.1-hadoop2.2.0.jar at http://182.168.133.28:50955/jars/spark-examples-1.4.1-hadoop2.2.0.jar with timestamp 1438692298221
15/08/04 20:44:58 INFO Executor: Starting executor ID driver on host localhost
15/08/04 20:44:58 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45731.
15/08/04 20:44:58 INFO NettyBlockTransferService: Server created on 45731
15/08/04 20:44:58 INFO BlockManagerMaster: Trying to register BlockManager
15/08/04 20:44:58 INFO BlockManagerMasterEndpoint: Registering block manager localhost:45731 with 265.4 MB RAM, BlockManagerId(driver, localhost, 45731)
15/08/04 20:44:58 INFO BlockManagerMaster: Registered BlockManager
15/08/04 20:44:59 INFO SparkContext: Starting job: reduce at SparkPi.scala:35
15/08/04 20:44:59 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 2 output partitions (allowLocal=false)
15/08/04 20:44:59 INFO DAGScheduler: Final stage: ResultStage 0(reduce at SparkPi.scala:35)
15/08/04 20:44:59 INFO DAGScheduler: Parents of final stage: List()
15/08/04 20:44:59 INFO DAGScheduler: Missing parents: List()
15/08/04 20:44:59 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31), which has no missing parents
15/08/04 20:44:59 INFO MemoryStore: ensureFreeSpace(1888) called with curMem=0, maxMem=278302556
15/08/04 20:44:59 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1888.0 B, free 265.4 MB)
15/08/04 20:44:59 INFO MemoryStore: ensureFreeSpace(1202) called with curMem=1888, maxMem=278302556
15/08/04 20:44:59 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1202.0 B, free 265.4 MB)
15/08/04 20:44:59 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:45731 (size: 1202.0 B, free: 265.4 MB)
15/08/04 20:44:59 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:874
15/08/04 20:44:59 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31)
15/08/04 20:44:59 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/08/04 20:44:59 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1446 bytes)
15/08/04 20:44:59 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1446 bytes)
15/08/04 20:44:59 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
15/08/04 20:44:59 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/08/04 20:44:59 INFO Executor: Fetching http://182.168.133.28:50955/jars/spark-examples-1.4.1-hadoop2.2.0.jar with timestamp 1438692298221
15/08/04 20:45:00 INFO Utils: Fetching http://182.168.133.28:50955/jars/spark-examples-1.4.1-hadoop2.2.0.jar to /tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9/userFiles-f3a72f24-78e5-4d5d-82eb-dcc8c6b899cb/fetchFileTemp5981400277552657211.tmp
15/08/04 20:45:03 INFO Executor: Adding file:/tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9/userFiles-f3a72f24-78e5-4d5d-82eb-dcc8c6b899cb/spark-examples-1.4.1-hadoop2.2.0.jar to class loader
15/08/04 20:45:03 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 736 bytes result sent to driver
15/08/04 20:45:03 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 736 bytes result sent to driver
15/08/04 20:45:03 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3722 ms on localhost (1/2)
15/08/04 20:45:03 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 3685 ms on localhost (2/2)
15/08/04 20:45:03 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/08/04 20:45:03 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:35) finished in 3.750 s
15/08/04 20:45:03 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:35, took 4.032610 s
Pi is roughly 3.14038
15/08/04 20:45:03 INFO SparkUI: Stopped Spark web UI at http://182.168.133.28:4040
15/08/04 20:45:03 INFO DAGScheduler: Stopping DAGScheduler
15/08/04 20:45:03 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
15/08/04 20:45:03 INFO Utils: path = /tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9/blockmgr-5ed813af-a26f-413c-bdfc-1e08001f9cb2, already present as root for deletion.
15/08/04 20:45:03 INFO MemoryStore: MemoryStore cleared
15/08/04 20:45:03 INFO BlockManager: BlockManager stopped
15/08/04 20:45:03 INFO BlockManagerMaster: BlockManagerMaster stopped
15/08/04 20:45:03 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
15/08/04 20:45:03 INFO SparkContext: Successfully stopped SparkContext
15/08/04 20:45:03 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/08/04 20:45:03 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/08/04 20:45:03 INFO Utils: Shutdown hook called
15/08/04 20:45:03 INFO Utils: Deleting directory /tmp/spark-da217260-adb6-474e-9908-9dcdd39371e9
With this introduction, it give you an idea on what spark is all about, you can basically use spark to do distributed processing. These tutorial give some quick idea on what spark is all about and how I can use it. It is definitely worth while to look into the example directory to see what can spark really do for you. Before I end this, I think these two links are very helpful to get you further.
http://spark.apache.org/docs/latest/quick-start.html
http://spark.apache.org/docs/latest/#launching-on-a-cluster
No comments:
Post a Comment