Socialabel

Saturday, October 19, 2013

Install Hadoop Pada Kluster Komputer


Untuk Instalasi Hadoop pada kluster dilakukan pada sistem operasi Linux OpenSuse 12.3. Software-software supporter yang diperlukan untuk instalasi Hadoop adalah
1. Java (JRE/JDK)
2. Framework Hadoop (Hadoop 1.2.0)
3. SSH

[Lakukan Pada semua host]
[Master/Node1/Node2/Node3/Node4].

Tahapan pertama silahkan konfigurasi jaringan, pastikan semua host sudah saling terhubung, silahkan gunakan aplikasi ping sebagai tester. jika ada respon dari host tujuan tahap pertama selesai, jika tidak silahkan cek konfigurasi fisik dan logik serta firewall.

Asumsi network yang saya gunakan adalah 192.168.10.0/24 dengan rincian sebagai berikut
 1. master : 192.168.10.2/24  
 2. node1  : 192.168.10.10/24  
 3. node2  : 192.168.10.11/24  
 4. node3  : 192.168.10.12/24  
 5. node4  : 192.168.10.13/24  


Silahkan edit file hosts (root level) pada masing-masing host dengan kriteria sebagai berikut:
 master:/home/Hery # vim /etc/hosts  
 ########### /etc/hosts #############  
 192.168.10.1  master  
 192.168.10.10 node1  
 192.168.10.11 node2  
 192.168.10.12 node3  
 192.168.10.13 node4  

Jika sudah reboot semua komputer master hingga node4, agar konfigurasi hosts file dapat diload kembali.

Selanjutnya silahkan install dahulu package yang diperlukan tersebut melalui Tarbal/rpm/yast/zypper untuk runtime java nya pada masing-masing host komputer yang akan digunakan sebagai Kluster (Master/Node1/Node2/Node3/Node4). Kalau menggunakan OS lain silahkan menyesuaikan. disini versi java yang saya gunakan adalah OpenJDK 1.7.0_15
 hery@master:~> java -version  
 java version "1.7.0_15"  
 OpenJDK Runtime Environment (IcedTea7 2.3.7) (suse-8.3.1-i386)  
 OpenJDK Server VM (build 23.7-b01, mixed mode)  

selanjutnya cari informasi folder tempat java terpasang, informasi dasar untuk mentracer aplikasi tersebut dapat menggunakan perintah whereis dan ll (ls -al)
 hery@master:~> whereis java  
 java: /usr/bin/java /etc/java /usr/lib/java /usr/bin/X11/java /usr/share/java /usr/share/man/man1/java.1.gz  
 hery@master:~> ll /usr/bin/java  
 lrwxrwxrwx 1 root root 22 Oct 16 23:57 /usr/bin/java -> /etc/alternatives/java  
 hery@master:~> ll /etc/alternatives/java  
 lrwxrwxrwx 1 root root 39 Oct 16 23:57 /etc/alternatives/java -> /usr/lib/jvm/jre-1.7.0-openjdk/bin/java  
 hery@master:~> ll /usr/lib/jvm/jre-1.7.0-openjdk/bin/java  
 -rwxrwxr-x 1 root root 5676 Feb 24 2013 /usr/lib/jvm/jre-1.7.0-openjdk/bin/java  
 hery@master:~>   

disini diketahui lokasi folder aplikasi java yaitu
/usr/lib/jvm/jre-1.7.0-openjdk


Sebelum ke aplikasi Framework Hadoop, buat dulu sebuah user baru pada masing-masing komputer Host sebagai user yang akan menjalankan engine hadoop nanti, hduser adalah user yang digunakan.
 master:/home/Hery # useradd hduser  
 master:/home/Hery # passwd hduser  
 New password:******   
 BAD PASSWORD: it does not contain enough DIFFERENT characters  
 BAD PASSWORD: is a palindrome  
 Retype new password:******   
 passwd: password updated successfully  
 master:/home/Hery #   

Kemudian silahkan download Framework Hadoop, dan ekstrak dengan menggunakan tar.
 master:/home/Hery # tar -zxvf hadoop-1.2.0.tar.gz -C /opt/  
 master:/home/Hery # chown hduser: -R /opt/hadoop-1.2.0  
 master:/home/Hery # chmod 755 -R /opt/hadoop-1.2.0  

Setelah Java dan Hadoop terpasang, step berikutnya adalah memasang SSH gunakan yast untuk install ssh jika belum terinstall, defaultnya hampir semua distro linux sudah menyediakan package sshd ketika diinstall pada komputer. Silahkan aktifkan daemon tersebut, pada opensuse
 master:/home/Hery # service sshd start  
 master:/home/Hery # /sbin/rcSuSEfirewall2 stop      #tidak direkomendasikan


Karena Hadoop nanti akan menggunakan beberapa port, sebaiknya buka port tersebut pada firewall/iptables di linux. jika tidak mau dipusingkan silahkan matikan saja firewall (hati-hati jika sistem dapat diakses publik), dalam kasus ini port yang digunakan adalah 22, 54310, 54311, 50030, 50070

Setelah SSH dan port selesai, silahkan membuat rsa key agar user bersangkutan dapat melakukan remote pada komputer-komputer kluster tanpa password. Login dengan user hduser pada salah satu komputer misalkan komputer master
 hduser@master:~> ssh-keygen -t rsa -P ""  
 Generating public/private rsa key pair.  
 Enter file in which to save the key (/home/hduser/.ssh/id_rsa):   
 Your identification has been saved in /home/hduser/.ssh/id_rsa.  
 Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.  
 The key fingerprint is:  
 e6:1c:c8:0d:b1:68:a0:e5:e5:d5:4b:60:30:29:12:5e hduser@master
 The key's randomart image is:  
 +--[ RSA 2048]----+  
 |..oE+o=o     |  
 |o+o+.= oo    |  
 |.o..+ o. .    |  
 |  . . +.    |  
 |   o S    |  
 |    + .    |  
 |    o    |  
 |         |  
 |         |  
 +-----------------+  
 hduser@master:~>   

selanjutnya: Kopi SSH-Key ke komputer Master
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub master  
 The authenticity of host 'master (192.168.10.2)' can't be established.  
 ECDSA key fingerprint is 3a:51:e5:ab:43:c0:65:ec:42:6c:8d:e2:c7:de:a7:ea.  
 Are you sure you want to continue connecting (yes/no)? yes  
 Warning: Permanently added 'master' (ECDSA) to the list of known hosts.  
 hduser@master's password:   
 Now try logging into the machine, with "ssh 'master'", and check in:  
  ~/.ssh/authorized_keys  
 to make sure we haven't added extra keys that you weren't expecting.  


Kopi SSH-Key ke komputer Master-Node4, lakukan pada setiap komputer [X321]
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub master  
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub node1  
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub node2  
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub node3  
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub node4  


Selanjutnya silahkan remote masing-masing node dari komputer master saja dengan cara login user hduser
 hduser@master:~> ssh hduser@node1  

Ulangi perintah diatas [X321] untuk Node1 - Node 4 !!


Selanjutnya masuk ke bagian konfigurasi pada sistem hadoop yang terdapat pada folder
master:/home/Hery # /opt/hadoop-1.2.0/conf/

Ubahlah file hadoop-env.sh pada baris
 # export JAVA_HOME=/usr/lib/j2sdk1.5-sun  

 ## menjadi   
 ## Hilangkan tanda pagar "# " dan rubah sesuai dengan   
 ## direktori aplikasi java yang diperoleh dari kegiatan whereis  
 export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk    

Selanjutnya kita mendefenisikan komputer yang menjadi master dan komputer yang menjadi slave pada kluster dengan mengedit file conf/masters dan conf/slaves
 ########### conf/masters ############  
 master  
 ########## conf/slaves ##############  
 node1  
 node2  
 node3  
 node4  

Kemudian pada folder conf, edit file-file yang berakhiran ekstension *-site.xml seperti berikut;

1. core-site.xml
 <configuration>  
     <property>  
         <name>fs.default.name</name>  
         <value>hdfs://master:54310</value>  
     </property>  
 </configuration>  

2. mapred-site.xml
 <configuration>  
     <property>  
         <name>mapred.job.tracker</name>  
         <value>master:54311</value>  
     </property>  
 </configuration>  

3. hdfs-site.xml
 <configuration>  
     <property>  
         <name>dfs.replication</name>  
         <value>4</value>  
     </property>  
 </configuration>  


Setelah semua konfigurasi diatas selesai di lakukan artinya proses installasi hadoop telah selesai, tahapan terakhir adalah menjalankan hadoop

1. Format distributed filesystem (HDFS) pada kluster:
Lakukan pada komputer master (salah satu komputer yang terdapat pada kluster)
 hduser@master:/> cd /opt/hadoop-1.2.0/  
 hduser@master:/opt/hadoop-1.2.0> bin/hadoop namenode -format  

2. Jalankan HDFS pada komputer master dengan perintah:
 hduser@master:/opt/hadoop-1.2.0> bin/start-dfs.sh  

3. Jalankan MapReduce pada komputer master dengan perintah:
 hduser@master:/opt/hadoop-1.2.0> bin/start-mapred.sh  

4. Silahkan buka pada browser http://master:50030 dan http://master:50070 untuk memonitoring Mapreduce Job dan HDFS

5. Testing
 hduser@master:/> cd /opt/hadoop-1.2.0/  
 hduser@master:/opt/hadoop-1.2.0> bin/hadoop jar hadoop-examples-1.2.0.jar pi 2 10  
 Number of Maps = 2  
 Samples per Map = 10  


Silahkan cek semua port yang berjalan yang digunakan oleh hadoop.

 hduser@master:/opt/hadoop-1.2.0> netstat -plten | grep java  
 (Not all processes could be identified, non-owned process info  
  will not be shown, you would have to be root to see it all.)  
 tcp    0   0 :::50030               :::*          LISTEN   1000    13100   3936/java  
 tcp    0   0 :::50070               :::*          LISTEN   1000    12600   3593/java  
 tcp    0   0 :::46102               :::*          LISTEN   1000    12309   3593/java  
 tcp    0   0 127.0.0.1:33943        :::*          LISTEN   1000    13217   4065/java  
 tcp    0   0 :::50010               :::*          LISTEN   1000    12895   3715/java  
 tcp    0   0 :::50075               :::*          LISTEN   1000    13032   3715/java  
 tcp    0   0 :::51075               :::*          LISTEN   1000    12686   3843/java  
 tcp    0   0 :::50020               :::*          LISTEN   1000    13082   3715/java  
 tcp    0   0 192.168.10.2:54310     :::*          LISTEN   1000    12493   3593/java  
 tcp    0   0 192.168.10.2:54311     :::*          LISTEN   1000    13093   3936/java  
 tcp    0   0 :::34503               :::*          LISTEN   1000    12650   3715/java  
 tcp    0   0 :::37577               :::*          LISTEN   1000    13071   3936/java  
 tcp    0   0 :::50090               :::*          LISTEN   1000    12896   3843/java  
 tcp    0   0 :::50060               :::*          LISTEN   1000    13478   4065/java  



##############################

Info:
Jika pada saat testing program muncul notifikasi gagal replication, kemungkinan firewall memblok proses HDFS terhadap kluster dijaringan, silahkan buka akses port yang digunakan mapreduce job pada firewall  

##############################

33 comments:

  1. There are lots of information about latest technology and how to get trained in them, like Big Data Training in Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Big Data Training). By the way you are running a great blog. Thanks for sharing this.

    Hadoop Training in Chennai | Big Data Training in Chennai

    ReplyDelete
  2. Managing a business data is not an easy thing, it is very complex process to handle the corporate information both Hadoop and cognos doing this in a easy manner with help of business software suite, thanks for sharing this useful post….
    Regards,
    cognos Training in Chennai|cognos Training|cognos tm1 Training in Chennai

    ReplyDelete
  3. A table is the basic unit of data storage in an oracle database. The table of a database hold all of the user accesible data. Table data is stored in rows and columns. But what is all about the clusters and how to handle it using oracle database system? Expecting a right answer from you. By the way you are maintaining a great blog. Thanks for sharing this in here.
    Oracle Training in Chennai | Oracle Course in Chennai | Oracle Training Center in Chennai

    ReplyDelete
  4. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging…
    Regards,
    SAP training in chennai|SAP course in chennai

    ReplyDelete
  5. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging…
    Regards,
    Best Informatica Training In Chennai|Informatica training center in Chennai|Informatica training chennai

    ReplyDelete
  6. Thanks for sharing this valuable post to my knowledge great pleasure to be here SAS has great scope in IT industry. It’s an application suite that can change, manage & retrieve data from the variety of origin & perform statistical analytic on it…
    Regards,
    sas training in Chennai|sas course in Chennai|sas training institutes in Chennai

    ReplyDelete
  7. Maharashtra Police Patil Recruitment 2016


    This is very interesting, Really great you are a very skilled blogger.............

    ReplyDelete
  8. Very interesting blog.Thanks for sharing.Informatica is a data integration/ETL tool that provides functionality for data transformation and loading of data. Informatica gets data from various sources and it loads the data into different targets. Informatica does not store, it just moves data from one place to another.

    Informatica training in Chennai

    ReplyDelete
  9. Thanks for sharing informative article on java application development. Your post helped to understand the career in Java. JAVA Training in Chennai

    ReplyDelete
  10. Great post.It was good and impressive.SAS (Statistical analysis system), the world's fastest and powerful statistical package for data analysis.
    Thanks,
    SAS Course in Chennai | SAS Institutes in Chennai

    ReplyDelete
  11. The strategy you posted was nice. The people who want to shift their career to the IT sector then it is the right option to go with the ethical hacking course.
    Ethical hacking course in Chennai | Ethical hacking training in chennai

    ReplyDelete
  12. Hadoop is the most popular language.Initially hadoop is developed for large amount of data sets in OLAP environment. With introduction of Hbase on top of hadoop, can be used for OLAP Processing also. Hadoop is a framework with all the subcomponents like map reduce,hdfs,hbase,pig.
    Hadoop Training in Chennai | Hadoop course in Chennai | Hadoop Training institutes in Chennai

    ReplyDelete
  13. SAS training will prepare students for rewarding and very well paying career as SAS analyst, programmer, developer or consultant.
    SAS Training in Chennai | SAS Course in Chennai

    ReplyDelete
  14. This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

    ReplyDelete
  15. I like the post format as you create user engagement in the complete article. It seems round up of all published posts. Thanks for gauging the informative posts.
    cara menggugurkan kandungan
    http://jualobatpenggugurkandungan.net/ obat penggugur kandungan
    http://tandatandakehamilan.net/ tanda tanda kehamilan
    http://tandatandakehamilan.net/cara-cepat-dan-selamat-menggugurkan-kandungan/ cara menggugurkan kandungan
    http://obataborsi59.com/ obat aborsi
    http://obataborsi59.com/cara-menggugurkan-kandungan-dengan-cepat-dan-aman/ cara menggugurkan kandungan

    ReplyDelete
  16. I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.digital marketing training in chennai

    ReplyDelete
  17. Thank you for this post. Thats all I are able to say. You most absolutely have built this blog website into something speciel. You clearly know what you are working on, youve insured so many corners.thanks
    Digital Marketing Training in Chennai

    Digital Marketing Training in Bangalore

    digital marketing training in tambaram

    digital marketing training in annanagar

    ReplyDelete
  18. Great post! I am actually getting ready to across this information, It’s very helpful for this blog.Also great with all of the valuable information you have Keep up the good work you are doing well.
    Digital Marketing online training

    full stack developer training in pune

    full stack developer training in annanagar

    full stack developer training in tambaram

    ReplyDelete
  19. Fantastic work! This is the type of information that should follow collective approximately the web. Embarrassment captivating position Google for not positioning this transmit higher! Enlarge taking place greater than and visit my web situate
    python training institute in chennai
    python training in Bangalore
    python training in pune

    ReplyDelete
  20. Thank you for allowing me to read it, welcome to the next in a recent article. And thanks for sharing the nice article, keep posting or updating news article.
    Blueprism training in tambaram

    Blueprism training in annanagar

    Blueprism training in velachery

    ReplyDelete
  21. This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me.. 

    Data Science training in kalyan nagar
    Data Science training in OMR
    selenium training in chennai
    Data Science with Python training in chenni
    Data science training in velachery

    ReplyDelete
  22. Just stumbled across your blog and was instantly amazed with all the useful information that is on it. Great post, just what i was looking for and i am looking forward to reading your other posts soon!
    java training in chennai | java training in bangalore

    java online training | java training in pune

    ReplyDelete
  23. Whoa! I’m enjoying the template/theme of this website. It’s simple, yet effective. A lot of times it’s very hard to get that “perfect balance” between superb usability and visual appeal. I must say you’ve done a very good job with this.

    AWS Training in Bangalore | Amazon Web Services Training in bangalore , india

    AWS Training in pune | Amazon Web Services Training in Pune, india

    AWS Training in Chennai|Amazon Web Services Training in Chennai,India


    aws online training and certification | amazon web services online training ,india

    ReplyDelete
  24. adding your RSS feed to my Google account. I look forward to fresh updates and will talk about this blog with my Facebook group. Chnebosh course in chennai
    at soon!

    ReplyDelete
  25. Really I Appreciate The Effort You Made To Share The Knowledge. This Is Really A Great Stuff For Sharing. Keep It Up . Thanks For Sharing.

    Angular Training in Chennai
    Angular JS Training in Chennai

    ReplyDelete
  26. Amazing Article ! I have bookmarked this article page as i received good information from this. All the best for the upcoming articles. I will be waiting for your new articles. Thank You ! Kindly Visit Us @ Coimbatore Travels | Ooty Travels | Coimbatore Airport Taxi | Coimbatore taxi

    ReplyDelete
  27. Good Post! you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas.

    Python Training in Chennai
    Python Training

    ReplyDelete