See slides from our talk at Rockville Raspberry Pi Jam 2017: Cluser Computing with Raspberry Pi
Figure out or set the IP Addresses of all nodes on your network.
[Cool trick for seeing local network Raspberry Pi’s]
1 2 3 | sudo apt-get install nmap sudo nmap -sP 192.168.1.0/24 | awk '/^Nmap/{ip=$NF}/B8:27:EB/{print ip}' Getting your current IP hostname -I |
[Get SSH set up into all Devices]
1 | sudo raspi-config |
->Advanced Options
->SSH
Enable -> Yes
On master I assume you are using the user Pi, but if not, make sure all Pi’s have the same user with the same password
1 2 | ssh-keygen -t rsa -P "" ->Enter file in which to save the key (/home/pi/.ssh/id_rsa): [Enter] |
[Generate a Host Addition File]
1 2 | apt-get install vim vim hosts_addition |
Inside vim
press i to enter insert mode
1 2 3 | 192.168.1.2 master 192.168.1.3 slave01 192.168.1.4 slave02 |
press ESC to exit insert mode
press :wq
to save
Make a copy of your hosts file:
1 2 3 4 | cp /ect/hosts ~/hosts.bak sudo apt-get update sudo apt-get upgrade |
[Install Master Dependencies]
install the prerequisites:
1 2 | sudo apt-get install scala sudo apt-get install oracle-java8-jdk |
[Install Master Spark]
Note past versions of Spark needed the oracle-java7-jdk, however in the new verion this will throw an error: [Image]
Download the latest version of Spark:
http://spark.apache.org/downloads.html
1 | wget https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz |
Untar Tarball:
1 | tar xzf spark-2.2.0-bin-hadoop2.7.tgz |
Edit .bashsrc (in your users home directory)
Add this to the end:
1 2 3 | export JAVA_HOME=<path-of-Java-installation> (eg: /usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/) export SPARK_HOME=<path-to-the-root-of-your-spark-installation> (eg: /home/pi/spark-2.2.0-bin-hadoop2.7/) export PATH=$PATH:/home/pi/spark-2.2.0-bin-hadoop2.7/bin |
Reload .bashsrc
1 | source ~/.bashrc |
Test your variables by typing in the command line:
1 2 | $JAVA_HOME $SPARK_HOME |
[Configure Master Spark]
1 2 3 4 5 6 7 8 | cd ~/spark-2.2.0-bin-hadoop2.7/conf sudo cp spark-env.sh.template spark-env.sh sudo vim spark-env.sh export SPARK_WORKER_CORES="2" export SPARK_WORKER_MEMORY="512m" export SPARK_MASTER_HOST="192.168.1.2" |
Press esc to exit insert/edit text mode and press :wq
Note that the variable in older versions was called SPARK_MASTER_IP
RP1 = 256m
RP2/RP3 = 512m
1 | >sudo vim slaves |
press i
Enter your slaves, one per line:
1 2 | slave01 slave02 |
[Create A Configured Version of Spark to Share]
1 2 | cd ~ tar czf spark.tar.gz spark-2.2.0-bin-hadoop2.7 |
—FOR ALL SLAVES, do each of the steps below for each slave.
ssh into your slave
[Copy Over Files Needed from Master to Slave]
1 2 3 4 5 | scp spark.tar.gz slave01:~ scp hosts_addition slave01:~ scp ~/.ssh/id_rsa.pub slave01:~ ssh slave1 |
[update all slaves]
1 2 | sudo apt-get update sudo apt-get upgrade |
install dependencies
1 2 3 | sudo apt-get install oracle-java8-jdk scala sudo apt-get install vim |
[Add authorized SSH key on all slaves]
on slave:
1 2 | mkdir -p ~/.ssh touch ~/.ssh/authorized_keys |
Note that this does not damage existing directory of files if any
Verify the status of the files:
1 2 3 | ls -a /home/pi/.ssh cat ~/id_rsa.pub >> ~/.ssh/authorized_keys rm ~/id_rsa.pub |
Allow slave SSH into Master
1 2 3 4 5 6 7 | ssh-keygen -t rsa -P "" scp ~/.ssh/id_rsa.pub master:~ ssh master ls -a /home/pi/.ssh cat ~/id_rsa.pub >> ~/.ssh/authorized_keys rm ~/id_rsa.pub logout |
[Add hosts data to all slaves]
add the relevant lines to your hosts file:
1 | sudo vim /etc/hosts |
press G
to get cursor to the last line of the file. This is important so you don’t corrupt the structure of existing data.
press :r /home/pi/hosts_addition
press :wq
[Uncompress the Configured Spark]
1 | tar xzf spark.tar.gz |
Additional Useful References:
http://bailiwick.io/2015/07/07/create-your-own-apache-spark-cluster-using-raspberry-pi-2/