Socialabel

Saturday, October 19, 2013

Install Hadoop Pada Kluster Komputer


Untuk Instalasi Hadoop pada kluster dilakukan pada sistem operasi Linux OpenSuse 12.3. Software-software supporter yang diperlukan untuk instalasi Hadoop adalah
1. Java (JRE/JDK)
2. Framework Hadoop (Hadoop 1.2.0)
3. SSH

[Lakukan Pada semua host]
[Master/Node1/Node2/Node3/Node4].

Tahapan pertama silahkan konfigurasi jaringan, pastikan semua host sudah saling terhubung, silahkan gunakan aplikasi ping sebagai tester. jika ada respon dari host tujuan tahap pertama selesai, jika tidak silahkan cek konfigurasi fisik dan logik serta firewall.

Asumsi network yang saya gunakan adalah 192.168.10.0/24 dengan rincian sebagai berikut
 1. master : 192.168.10.2/24  
 2. node1  : 192.168.10.10/24  
 3. node2  : 192.168.10.11/24  
 4. node3  : 192.168.10.12/24  
 5. node4  : 192.168.10.13/24  


Silahkan edit file hosts (root level) pada masing-masing host dengan kriteria sebagai berikut:
 master:/home/Hery # vim /etc/hosts  
 ########### /etc/hosts #############  
 192.168.10.1  master  
 192.168.10.10 node1  
 192.168.10.11 node2  
 192.168.10.12 node3  
 192.168.10.13 node4  

Jika sudah reboot semua komputer master hingga node4, agar konfigurasi hosts file dapat diload kembali.

Selanjutnya silahkan install dahulu package yang diperlukan tersebut melalui Tarbal/rpm/yast/zypper untuk runtime java nya pada masing-masing host komputer yang akan digunakan sebagai Kluster (Master/Node1/Node2/Node3/Node4). Kalau menggunakan OS lain silahkan menyesuaikan. disini versi java yang saya gunakan adalah OpenJDK 1.7.0_15
 hery@master:~> java -version  
 java version "1.7.0_15"  
 OpenJDK Runtime Environment (IcedTea7 2.3.7) (suse-8.3.1-i386)  
 OpenJDK Server VM (build 23.7-b01, mixed mode)  

selanjutnya cari informasi folder tempat java terpasang, informasi dasar untuk mentracer aplikasi tersebut dapat menggunakan perintah whereis dan ll (ls -al)
 hery@master:~> whereis java  
 java: /usr/bin/java /etc/java /usr/lib/java /usr/bin/X11/java /usr/share/java /usr/share/man/man1/java.1.gz  
 hery@master:~> ll /usr/bin/java  
 lrwxrwxrwx 1 root root 22 Oct 16 23:57 /usr/bin/java -> /etc/alternatives/java  
 hery@master:~> ll /etc/alternatives/java  
 lrwxrwxrwx 1 root root 39 Oct 16 23:57 /etc/alternatives/java -> /usr/lib/jvm/jre-1.7.0-openjdk/bin/java  
 hery@master:~> ll /usr/lib/jvm/jre-1.7.0-openjdk/bin/java  
 -rwxrwxr-x 1 root root 5676 Feb 24 2013 /usr/lib/jvm/jre-1.7.0-openjdk/bin/java  
 hery@master:~>   

disini diketahui lokasi folder aplikasi java yaitu
/usr/lib/jvm/jre-1.7.0-openjdk


Sebelum ke aplikasi Framework Hadoop, buat dulu sebuah user baru pada masing-masing komputer Host sebagai user yang akan menjalankan engine hadoop nanti, hduser adalah user yang digunakan.
 master:/home/Hery # useradd hduser  
 master:/home/Hery # passwd hduser  
 New password:******   
 BAD PASSWORD: it does not contain enough DIFFERENT characters  
 BAD PASSWORD: is a palindrome  
 Retype new password:******   
 passwd: password updated successfully  
 master:/home/Hery #   

Kemudian silahkan download Framework Hadoop, dan ekstrak dengan menggunakan tar.
 master:/home/Hery # tar -zxvf hadoop-1.2.0.tar.gz -C /opt/  
 master:/home/Hery # chown hduser: -R /opt/hadoop-1.2.0  
 master:/home/Hery # chmod 755 -R /opt/hadoop-1.2.0  

Setelah Java dan Hadoop terpasang, step berikutnya adalah memasang SSH gunakan yast untuk install ssh jika belum terinstall, defaultnya hampir semua distro linux sudah menyediakan package sshd ketika diinstall pada komputer. Silahkan aktifkan daemon tersebut, pada opensuse
 master:/home/Hery # service sshd start  
 master:/home/Hery # /sbin/rcSuSEfirewall2 stop      #tidak direkomendasikan


Karena Hadoop nanti akan menggunakan beberapa port, sebaiknya buka port tersebut pada firewall/iptables di linux. jika tidak mau dipusingkan silahkan matikan saja firewall (hati-hati jika sistem dapat diakses publik), dalam kasus ini port yang digunakan adalah 22, 54310, 54311, 50030, 50070

Setelah SSH dan port selesai, silahkan membuat rsa key agar user bersangkutan dapat melakukan remote pada komputer-komputer kluster tanpa password. Login dengan user hduser pada salah satu komputer misalkan komputer master
 hduser@master:~> ssh-keygen -t rsa -P ""  
 Generating public/private rsa key pair.  
 Enter file in which to save the key (/home/hduser/.ssh/id_rsa):   
 Your identification has been saved in /home/hduser/.ssh/id_rsa.  
 Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.  
 The key fingerprint is:  
 e6:1c:c8:0d:b1:68:a0:e5:e5:d5:4b:60:30:29:12:5e hduser@master
 The key's randomart image is:  
 +--[ RSA 2048]----+  
 |..oE+o=o     |  
 |o+o+.= oo    |  
 |.o..+ o. .    |  
 |  . . +.    |  
 |   o S    |  
 |    + .    |  
 |    o    |  
 |         |  
 |         |  
 +-----------------+  
 hduser@master:~>   

selanjutnya: Kopi SSH-Key ke komputer Master
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub master  
 The authenticity of host 'master (192.168.10.2)' can't be established.  
 ECDSA key fingerprint is 3a:51:e5:ab:43:c0:65:ec:42:6c:8d:e2:c7:de:a7:ea.  
 Are you sure you want to continue connecting (yes/no)? yes  
 Warning: Permanently added 'master' (ECDSA) to the list of known hosts.  
 hduser@master's password:   
 Now try logging into the machine, with "ssh 'master'", and check in:  
  ~/.ssh/authorized_keys  
 to make sure we haven't added extra keys that you weren't expecting.  


Kopi SSH-Key ke komputer Master-Node4, lakukan pada setiap komputer [X321]
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub master  
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub node1  
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub node2  
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub node3  
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub node4  


Selanjutnya silahkan remote masing-masing node dari komputer master saja dengan cara login user hduser
 hduser@master:~> ssh hduser@node1  

Ulangi perintah diatas [X321] untuk Node1 - Node 4 !!


Selanjutnya masuk ke bagian konfigurasi pada sistem hadoop yang terdapat pada folder
master:/home/Hery # /opt/hadoop-1.2.0/conf/

Ubahlah file hadoop-env.sh pada baris
 # export JAVA_HOME=/usr/lib/j2sdk1.5-sun  

 ## menjadi   
 ## Hilangkan tanda pagar "# " dan rubah sesuai dengan   
 ## direktori aplikasi java yang diperoleh dari kegiatan whereis  
 export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk    

Selanjutnya kita mendefenisikan komputer yang menjadi master dan komputer yang menjadi slave pada kluster dengan mengedit file conf/masters dan conf/slaves
 ########### conf/masters ############  
 master  
 ########## conf/slaves ##############  
 node1  
 node2  
 node3  
 node4  

Kemudian pada folder conf, edit file-file yang berakhiran ekstension *-site.xml seperti berikut;

1. core-site.xml
 <configuration>  
     <property>  
         <name>fs.default.name</name>  
         <value>hdfs://master:54310</value>  
     </property>  
 </configuration>  

2. mapred-site.xml
 <configuration>  
     <property>  
         <name>mapred.job.tracker</name>  
         <value>master:54311</value>  
     </property>  
 </configuration>  

3. hdfs-site.xml
 <configuration>  
     <property>  
         <name>dfs.replication</name>  
         <value>4</value>  
     </property>  
 </configuration>  


Setelah semua konfigurasi diatas selesai di lakukan artinya proses installasi hadoop telah selesai, tahapan terakhir adalah menjalankan hadoop

1. Format distributed filesystem (HDFS) pada kluster:
Lakukan pada komputer master (salah satu komputer yang terdapat pada kluster)
 hduser@master:/> cd /opt/hadoop-1.2.0/  
 hduser@master:/opt/hadoop-1.2.0> bin/hadoop namenode -format  

2. Jalankan HDFS pada komputer master dengan perintah:
 hduser@master:/opt/hadoop-1.2.0> bin/start-dfs.sh  

3. Jalankan MapReduce pada komputer master dengan perintah:
 hduser@master:/opt/hadoop-1.2.0> bin/start-mapred.sh  

4. Silahkan buka pada browser http://master:50030 dan http://master:50070 untuk memonitoring Mapreduce Job dan HDFS

5. Testing
 hduser@master:/> cd /opt/hadoop-1.2.0/  
 hduser@master:/opt/hadoop-1.2.0> bin/hadoop jar hadoop-examples-1.2.0.jar pi 2 10  
 Number of Maps = 2  
 Samples per Map = 10  


Silahkan cek semua port yang berjalan yang digunakan oleh hadoop.

 hduser@master:/opt/hadoop-1.2.0> netstat -plten | grep java  
 (Not all processes could be identified, non-owned process info  
  will not be shown, you would have to be root to see it all.)  
 tcp    0   0 :::50030               :::*          LISTEN   1000    13100   3936/java  
 tcp    0   0 :::50070               :::*          LISTEN   1000    12600   3593/java  
 tcp    0   0 :::46102               :::*          LISTEN   1000    12309   3593/java  
 tcp    0   0 127.0.0.1:33943        :::*          LISTEN   1000    13217   4065/java  
 tcp    0   0 :::50010               :::*          LISTEN   1000    12895   3715/java  
 tcp    0   0 :::50075               :::*          LISTEN   1000    13032   3715/java  
 tcp    0   0 :::51075               :::*          LISTEN   1000    12686   3843/java  
 tcp    0   0 :::50020               :::*          LISTEN   1000    13082   3715/java  
 tcp    0   0 192.168.10.2:54310     :::*          LISTEN   1000    12493   3593/java  
 tcp    0   0 192.168.10.2:54311     :::*          LISTEN   1000    13093   3936/java  
 tcp    0   0 :::34503               :::*          LISTEN   1000    12650   3715/java  
 tcp    0   0 :::37577               :::*          LISTEN   1000    13071   3936/java  
 tcp    0   0 :::50090               :::*          LISTEN   1000    12896   3843/java  
 tcp    0   0 :::50060               :::*          LISTEN   1000    13478   4065/java  



##############################

Info:
Jika pada saat testing program muncul notifikasi gagal replication, kemungkinan firewall memblok proses HDFS terhadap kluster dijaringan, silahkan buka akses port yang digunakan mapreduce job pada firewall  

##############################

17 comments:

  1. There are lots of information about latest technology and how to get trained in them, like Big Data Training in Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Big Data Training). By the way you are running a great blog. Thanks for sharing this.

    Hadoop Training in Chennai | Big Data Training in Chennai

    ReplyDelete
  2. Managing a business data is not an easy thing, it is very complex process to handle the corporate information both Hadoop and cognos doing this in a easy manner with help of business software suite, thanks for sharing this useful post….
    Regards,
    cognos Training in Chennai|cognos Training|cognos tm1 Training in Chennai

    ReplyDelete
  3. A table is the basic unit of data storage in an oracle database. The table of a database hold all of the user accesible data. Table data is stored in rows and columns. But what is all about the clusters and how to handle it using oracle database system? Expecting a right answer from you. By the way you are maintaining a great blog. Thanks for sharing this in here.
    Oracle Training in Chennai | Oracle Course in Chennai | Oracle Training Center in Chennai

    ReplyDelete
  4. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging…
    Regards,
    SAP training in chennai|SAP course in chennai

    ReplyDelete
  5. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging…
    Regards,
    Best Informatica Training In Chennai|Informatica training center in Chennai|Informatica training chennai

    ReplyDelete
  6. Thanks for sharing this valuable post to my knowledge great pleasure to be here SAS has great scope in IT industry. It’s an application suite that can change, manage & retrieve data from the variety of origin & perform statistical analytic on it…
    Regards,
    sas training in Chennai|sas course in Chennai|sas training institutes in Chennai

    ReplyDelete
  7. Maharashtra Police Patil Recruitment 2016


    This is very interesting, Really great you are a very skilled blogger.............

    ReplyDelete
  8. Very interesting blog.Thanks for sharing.Informatica is a data integration/ETL tool that provides functionality for data transformation and loading of data. Informatica gets data from various sources and it loads the data into different targets. Informatica does not store, it just moves data from one place to another.

    Informatica training in Chennai

    ReplyDelete
  9. Thanks for sharing informative article on java application development. Your post helped to understand the career in Java. JAVA Training in Chennai

    ReplyDelete
  10. Great post.It was good and impressive.SAS (Statistical analysis system), the world's fastest and powerful statistical package for data analysis.
    Thanks,
    SAS Course in Chennai | SAS Institutes in Chennai

    ReplyDelete
  11. The strategy you posted was nice. The people who want to shift their career to the IT sector then it is the right option to go with the ethical hacking course.
    Ethical hacking course in Chennai | Ethical hacking training in chennai

    ReplyDelete
  12. Hadoop is the most popular language.Initially hadoop is developed for large amount of data sets in OLAP environment. With introduction of Hbase on top of hadoop, can be used for OLAP Processing also. Hadoop is a framework with all the subcomponents like map reduce,hdfs,hbase,pig.
    Hadoop Training in Chennai | Hadoop course in Chennai | Hadoop Training institutes in Chennai

    ReplyDelete
  13. SAS training will prepare students for rewarding and very well paying career as SAS analyst, programmer, developer or consultant.
    SAS Training in Chennai | SAS Course in Chennai

    ReplyDelete
  14. This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

    ReplyDelete