본문 바로가기

[mac] Apache Spark Study -1 ( Spark설치 및 확인 )

 

개발 환경

- MacOS X Mojave 10.14.6 

- openjdk version "1.8.0_242"

- Python 3.7.3

 

1. Apache Spark 설치

case 1)  Homebrew 를 사용한 설치 

 Mac사용자라면 익숙한 패키지 관리자인 homebrew를 사용하여 설치가 가능

 

- brew search 를 통한 spark 패키지 검색 

  # brew search spark 

➜  ~ brew search spark
==> Formulae
apache-spark                                                     spark                                                            sparkey
==> Casks
spark                                                            sparkle                                                          sparkleshare

 -  brew insatll 명령어를 사용하여 spark 설치 ( spark가 아닌 apahce-spark로 설치 )

 # brew install apche-spark 

➜  ~ brew install apache-spark

 

  - 설치 확인 

  # brew list | grep spark

➜  ~ brew list  | grep spark
apache-spark

 # brew info apache-spark 

➜  ~ brew info apache-spark
apache-spark: stable 2.4.5, HEAD
Engine for large-scale data processing
https://spark.apache.org/
/usr/local/Cellar/apache-spark/2.4.5 (1,059 files, 250.9MB) *
  Built from source on 2020-04-13 at 01:57:28
From: https://github.com/Homebrew/homebrew-core/blob/master/Formula/apache-spark.rb
==> Requirements
Required: java = 1.8 ✔
==> Options
--HEAD
	Install HEAD version
==> Analytics
install: 5,214 (30 days), 18,502 (90 days), 65,388 (365 days)
install-on-request: 5,117 (30 days), 18,019 (90 days), 63,439 (365 days)
build-error: 0 (30 days)

  2020.04.13 기준 2.4.5 버전 설치 . 

 

  - 환경변수 설정 ( SPARK_HOME / PATH ) 

export SPARK_HOME=/usr/local/Cellar/apache-spark/2.4.5/libexec

 - 개인 설정에 맞게 환경변수를 추가하여 관리함.  ( /etc/proflie을 사용하였음 )

 - homebrew를 통하여 패키지를 설치할 경우 기본적으로 /usr/local/Cellar 경로로 설치됨 

 

 case 2 )  공식홈페이지를 통한 설치 (본 게시물에서는 자세히 다루지 않음)

 

 공식홈페이지 링크 : http://spark.apache.org/downloads

 

Downloads | Apache Spark

Download Apache Spark™ Choose a Spark release: Choose a package type: Download Spark: Verify this release using the and project release KEYS. Note that, Spark is pre-built with Scala 2.11 except version 2.4.2, which is pre-built with Scala 2.12. Latest Pre

spark.apache.org

공식홈페이지 다운로드 화면  

Spark 공식홈페이지의 tar파일을 설치 하여 압축 해제 후 사용 가능. 

 

2. 테스트 

 - 스파크는 다양한 언어 API를 지원함.

 - Spark-shell 을 사용하여 Scala 기반으로 Spark를 사용할 수 있음. 

 - pyspark 를 사용하여 python 기반으로 Spark를 사용할 수 있음  

 

 - Spark 설치시 제공되는 예제 실행 

➜  ~ $SPAKR_HOME/spark-submit --class org.apache.spark.examples.SparkPi \
--master local $SPARK_HOME/examples/jars/spark-examples_2.11-2.4.5.jar 10

 # 해당 예제를 실행할 경우 파이값을 특정 자리수까지 추출합니다. ( Scala base )

더보기

➜  ~ spark-submit --class org.apache.spark.examples.SparkPi --master local $SPARK_HOME/examples/jars/spark-examples_2.11-2.4.5.jar 10
20/04/13 02:40:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/04/13 02:40:45 INFO SparkContext: Running Spark version 2.4.5
20/04/13 02:40:45 INFO SparkContext: Submitted application: Spark Pi
20/04/13 02:40:45 INFO SecurityManager: Changing view acls to: jinsu
20/04/13 02:40:45 INFO SecurityManager: Changing modify acls to: jinsu
20/04/13 02:40:45 INFO SecurityManager: Changing view acls groups to:
20/04/13 02:40:45 INFO SecurityManager: Changing modify acls groups to:
20/04/13 02:40:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(jinsu); groups with view permissions: Set(); users  with modify permissions: Set(jinsu); groups with modify permissions: Set()
20/04/13 02:40:46 INFO Utils: Successfully started service 'sparkDriver' on port 62232.
20/04/13 02:40:46 INFO SparkEnv: Registering MapOutputTracker
20/04/13 02:40:46 INFO SparkEnv: Registering BlockManagerMaster
20/04/13 02:40:46 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/04/13 02:40:46 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/04/13 02:40:46 INFO DiskBlockManager: Created local directory at /private/var/folders/qg/gqwnnhh16z34kx_xh9hqc6r80000gn/T/blockmgr-b39461ee-692d-4c1b-aaca-3bea295cea69
20/04/13 02:40:46 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
20/04/13 02:40:46 INFO SparkEnv: Registering OutputCommitCoordinator
20/04/13 02:40:46 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/04/13 02:40:46 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.3:4040
20/04/13 02:40:46 INFO SparkContext: Added JAR file:/usr/local/Cellar/apache-spark/2.4.5/libexec/examples/jars/spark-examples_2.11-2.4.5.jar at spark://192.168.0.3:62232/jars/spark-examples_2.11-2.4.5.jar with timestamp 1586713246270
20/04/13 02:40:46 INFO Executor: Starting executor ID driver on host localhost
20/04/13 02:40:46 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 62233.
20/04/13 02:40:46 INFO NettyBlockTransferService: Server created on 192.168.0.3:62233
20/04/13 02:40:46 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/04/13 02:40:46 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.0.3, 62233, None)
20/04/13 02:40:46 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.3:62233 with 366.3 MB RAM, BlockManagerId(driver, 192.168.0.3, 62233, None)
20/04/13 02:40:46 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.0.3, 62233, None)
20/04/13 02:40:46 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.0.3, 62233, None)
20/04/13 02:40:46 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
20/04/13 02:40:46 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions
20/04/13 02:40:46 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
20/04/13 02:40:46 INFO DAGScheduler: Parents of final stage: List()
20/04/13 02:40:46 INFO DAGScheduler: Missing parents: List()
20/04/13 02:40:46 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
20/04/13 02:40:46 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.0 KB, free 366.3 MB)
20/04/13 02:40:46 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1381.0 B, free 366.3 MB)
20/04/13 02:40:46 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.3:62233 (size: 1381.0 B, free: 366.3 MB)
20/04/13 02:40:46 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1163
20/04/13 02:40:46 INFO DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
20/04/13 02:40:46 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks
20/04/13 02:40:46 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7866 bytes)
20/04/13 02:40:46 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
20/04/13 02:40:46 INFO Executor: Fetching spark://192.168.0.3:62232/jars/spark-examples_2.11-2.4.5.jar with timestamp 1586713246270
20/04/13 02:40:46 INFO TransportClientFactory: Successfully created connection to /192.168.0.3:62232 after 25 ms (0 ms spent in bootstraps)
20/04/13 02:40:46 INFO Utils: Fetching spark://192.168.0.3:62232/jars/spark-examples_2.11-2.4.5.jar to /private/var/folders/qg/gqwnnhh16z34kx_xh9hqc6r80000gn/T/spark-4c10897a-981a-4769-9fc1-0cf8fc4ddb18/userFiles-826e9154-6f3c-4471-a336-ceb0e437db7a/fetchFileTemp2394255115880163077.tmp
20/04/13 02:40:46 INFO Executor: Adding file:/private/var/folders/qg/gqwnnhh16z34kx_xh9hqc6r80000gn/T/spark-4c10897a-981a-4769-9fc1-0cf8fc4ddb18/userFiles-826e9154-6f3c-4471-a336-ceb0e437db7a/spark-examples_2.11-2.4.5.jar to class loader
20/04/13 02:40:47 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 867 bytes result sent to driver
20/04/13 02:40:47 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7866 bytes)
20/04/13 02:40:47 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
20/04/13 02:40:47 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 153 ms on localhost (executor driver) (1/10)
20/04/13 02:40:47 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 824 bytes result sent to driver
20/04/13 02:40:47 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 7866 bytes)
20/04/13 02:40:47 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
20/04/13 02:40:47 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 11 ms on localhost (executor driver) (2/10)
20/04/13 02:40:47 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 867 bytes result sent to driver
20/04/13 02:40:47 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 7866 bytes)
20/04/13 02:40:47 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
20/04/13 02:40:47 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 11 ms on localhost (executor driver) (3/10)
20/04/13 02:40:47 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 824 bytes result sent to driver
20/04/13 02:40:47 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, localhost, executor driver, partition 4, PROCESS_LOCAL, 7866 bytes)
20/04/13 02:40:47 INFO Executor: Running task 4.0 in stage 0.0 (TID 4)
20/04/13 02:40:47 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 12 ms on localhost (executor driver) (4/10)
20/04/13 02:40:47 INFO Executor: Finished task 4.0 in stage 0.0 (TID 4). 824 bytes result sent to driver
20/04/13 02:40:47 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, localhost, executor driver, partition 5, PROCESS_LOCAL, 7866 bytes)
20/04/13 02:40:47 INFO Executor: Running task 5.0 in stage 0.0 (TID 5)
20/04/13 02:40:47 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 12 ms on localhost (executor driver) (5/10)
20/04/13 02:40:47 INFO Executor: Finished task 5.0 in stage 0.0 (TID 5). 824 bytes result sent to driver
20/04/13 02:40:47 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, localhost, executor driver, partition 6, PROCESS_LOCAL, 7866 bytes)
20/04/13 02:40:47 INFO Executor: Running task 6.0 in stage 0.0 (TID 6)
20/04/13 02:40:47 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 12 ms on localhost (executor driver) (6/10)
20/04/13 02:40:47 INFO Executor: Finished task 6.0 in stage 0.0 (TID 6). 824 bytes result sent to driver
20/04/13 02:40:47 INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, localhost, executor driver, partition 7, PROCESS_LOCAL, 7866 bytes)
20/04/13 02:40:47 INFO Executor: Running task 7.0 in stage 0.0 (TID 7)
20/04/13 02:40:47 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 11 ms on localhost (executor driver) (7/10)
20/04/13 02:40:47 INFO Executor: Finished task 7.0 in stage 0.0 (TID 7). 867 bytes result sent to driver
20/04/13 02:40:47 INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, localhost, executor driver, partition 8, PROCESS_LOCAL, 7866 bytes)
20/04/13 02:40:47 INFO Executor: Running task 8.0 in stage 0.0 (TID 8)
20/04/13 02:40:47 INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 11 ms on localhost (executor driver) (8/10)
20/04/13 02:40:47 INFO Executor: Finished task 8.0 in stage 0.0 (TID 8). 824 bytes result sent to driver
20/04/13 02:40:47 INFO TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, localhost, executor driver, partition 9, PROCESS_LOCAL, 7866 bytes)
20/04/13 02:40:47 INFO Executor: Running task 9.0 in stage 0.0 (TID 9)
20/04/13 02:40:47 INFO TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 10 ms on localhost (executor driver) (9/10)
20/04/13 02:40:47 INFO Executor: Finished task 9.0 in stage 0.0 (TID 9). 824 bytes result sent to driver
20/04/13 02:40:47 INFO TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 12 ms on localhost (executor driver) (10/10)
20/04/13 02:40:47 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
20/04/13 02:40:47 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.382 s
20/04/13 02:40:47 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 0.420306 s
Pi is roughly 3.1425111425111427
20/04/13 02:40:47 INFO SparkUI: Stopped Spark web UI at http://192.168.0.3:4040
20/04/13 02:40:47 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/04/13 02:40:47 INFO MemoryStore: MemoryStore cleared
20/04/13 02:40:47 INFO BlockManager: BlockManager stopped
20/04/13 02:40:47 INFO BlockManagerMaster: BlockManagerMaster stopped
20/04/13 02:40:47 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/04/13 02:40:47 INFO SparkContext: Successfully stopped SparkContext
20/04/13 02:40:47 INFO ShutdownHookManager: Shutdown hook called
20/04/13 02:40:47 INFO ShutdownHookManager: Deleting directory /private/var/folders/qg/gqwnnhh16z34kx_xh9hqc6r80000gn/T/spark-0280f542-60f1-443e-b89d-f2d9caf231e7
20/04/13 02:40:47 INFO ShutdownHookManager: Deleting directory /private/var/folders/qg/gqwnnhh16z34kx_xh9hqc6r80000gn/T/spark-4c10897a-981a-4769-9fc1-0cf8fc4ddb18

 

3. Spark UI 확인

 - Spark shell 혹은 Pyspark를 실행할경우 Spark UI 접근 가능 

  # http://localhost:4040  (기본 SparkUI포트는 4040 )

엉망진창

개인 블로그 입니다. 코딩, 맛집, 정부정책, 서비스, ~방법 등 다양한 정보를 소개합니다