Socialabel

Saturday, October 19, 2013

Install Hadoop Pada Kluster Komputer


Untuk Instalasi Hadoop pada kluster dilakukan pada sistem operasi Linux OpenSuse 12.3. Software-software supporter yang diperlukan untuk instalasi Hadoop adalah
1. Java (JRE/JDK)
2. Framework Hadoop (Hadoop 1.2.0)
3. SSH

[Lakukan Pada semua host]
[Master/Node1/Node2/Node3/Node4].

Tahapan pertama silahkan konfigurasi jaringan, pastikan semua host sudah saling terhubung, silahkan gunakan aplikasi ping sebagai tester. jika ada respon dari host tujuan tahap pertama selesai, jika tidak silahkan cek konfigurasi fisik dan logik serta firewall.

Asumsi network yang saya gunakan adalah 192.168.10.0/24 dengan rincian sebagai berikut
 1. master : 192.168.10.2/24  
 2. node1  : 192.168.10.10/24  
 3. node2  : 192.168.10.11/24  
 4. node3  : 192.168.10.12/24  
 5. node4  : 192.168.10.13/24  


Silahkan edit file hosts (root level) pada masing-masing host dengan kriteria sebagai berikut:
 master:/home/Hery # vim /etc/hosts  
 ########### /etc/hosts #############  
 192.168.10.1  master  
 192.168.10.10 node1  
 192.168.10.11 node2  
 192.168.10.12 node3  
 192.168.10.13 node4  

Jika sudah reboot semua komputer master hingga node4, agar konfigurasi hosts file dapat diload kembali.

Selanjutnya silahkan install dahulu package yang diperlukan tersebut melalui Tarbal/rpm/yast/zypper untuk runtime java nya pada masing-masing host komputer yang akan digunakan sebagai Kluster (Master/Node1/Node2/Node3/Node4). Kalau menggunakan OS lain silahkan menyesuaikan. disini versi java yang saya gunakan adalah OpenJDK 1.7.0_15
 hery@master:~> java -version  
 java version "1.7.0_15"  
 OpenJDK Runtime Environment (IcedTea7 2.3.7) (suse-8.3.1-i386)  
 OpenJDK Server VM (build 23.7-b01, mixed mode)  

selanjutnya cari informasi folder tempat java terpasang, informasi dasar untuk mentracer aplikasi tersebut dapat menggunakan perintah whereis dan ll (ls -al)
 hery@master:~> whereis java  
 java: /usr/bin/java /etc/java /usr/lib/java /usr/bin/X11/java /usr/share/java /usr/share/man/man1/java.1.gz  
 hery@master:~> ll /usr/bin/java  
 lrwxrwxrwx 1 root root 22 Oct 16 23:57 /usr/bin/java -> /etc/alternatives/java  
 hery@master:~> ll /etc/alternatives/java  
 lrwxrwxrwx 1 root root 39 Oct 16 23:57 /etc/alternatives/java -> /usr/lib/jvm/jre-1.7.0-openjdk/bin/java  
 hery@master:~> ll /usr/lib/jvm/jre-1.7.0-openjdk/bin/java  
 -rwxrwxr-x 1 root root 5676 Feb 24 2013 /usr/lib/jvm/jre-1.7.0-openjdk/bin/java  
 hery@master:~>   

disini diketahui lokasi folder aplikasi java yaitu
/usr/lib/jvm/jre-1.7.0-openjdk


Sebelum ke aplikasi Framework Hadoop, buat dulu sebuah user baru pada masing-masing komputer Host sebagai user yang akan menjalankan engine hadoop nanti, hduser adalah user yang digunakan.
 master:/home/Hery # useradd hduser  
 master:/home/Hery # passwd hduser  
 New password:******   
 BAD PASSWORD: it does not contain enough DIFFERENT characters  
 BAD PASSWORD: is a palindrome  
 Retype new password:******   
 passwd: password updated successfully  
 master:/home/Hery #   

Kemudian silahkan download Framework Hadoop, dan ekstrak dengan menggunakan tar.
 master:/home/Hery # tar -zxvf hadoop-1.2.0.tar.gz -C /opt/  
 master:/home/Hery # chown hduser: -R /opt/hadoop-1.2.0  
 master:/home/Hery # chmod 755 -R /opt/hadoop-1.2.0  

Setelah Java dan Hadoop terpasang, step berikutnya adalah memasang SSH gunakan yast untuk install ssh jika belum terinstall, defaultnya hampir semua distro linux sudah menyediakan package sshd ketika diinstall pada komputer. Silahkan aktifkan daemon tersebut, pada opensuse
 master:/home/Hery # service sshd start  
 master:/home/Hery # /sbin/rcSuSEfirewall2 stop      #tidak direkomendasikan


Karena Hadoop nanti akan menggunakan beberapa port, sebaiknya buka port tersebut pada firewall/iptables di linux. jika tidak mau dipusingkan silahkan matikan saja firewall (hati-hati jika sistem dapat diakses publik), dalam kasus ini port yang digunakan adalah 22, 54310, 54311, 50030, 50070

Setelah SSH dan port selesai, silahkan membuat rsa key agar user bersangkutan dapat melakukan remote pada komputer-komputer kluster tanpa password. Login dengan user hduser pada salah satu komputer misalkan komputer master
 hduser@master:~> ssh-keygen -t rsa -P ""  
 Generating public/private rsa key pair.  
 Enter file in which to save the key (/home/hduser/.ssh/id_rsa):   
 Your identification has been saved in /home/hduser/.ssh/id_rsa.  
 Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.  
 The key fingerprint is:  
 e6:1c:c8:0d:b1:68:a0:e5:e5:d5:4b:60:30:29:12:5e hduser@master
 The key's randomart image is:  
 +--[ RSA 2048]----+  
 |..oE+o=o     |  
 |o+o+.= oo    |  
 |.o..+ o. .    |  
 |  . . +.    |  
 |   o S    |  
 |    + .    |  
 |    o    |  
 |         |  
 |         |  
 +-----------------+  
 hduser@master:~>   

selanjutnya: Kopi SSH-Key ke komputer Master
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub master  
 The authenticity of host 'master (192.168.10.2)' can't be established.  
 ECDSA key fingerprint is 3a:51:e5:ab:43:c0:65:ec:42:6c:8d:e2:c7:de:a7:ea.  
 Are you sure you want to continue connecting (yes/no)? yes  
 Warning: Permanently added 'master' (ECDSA) to the list of known hosts.  
 hduser@master's password:   
 Now try logging into the machine, with "ssh 'master'", and check in:  
  ~/.ssh/authorized_keys  
 to make sure we haven't added extra keys that you weren't expecting.  


Kopi SSH-Key ke komputer Master-Node4, lakukan pada setiap komputer [X321]
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub master  
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub node1  
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub node2  
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub node3  
 hduser@master:~> ssh-copy-id -i .ssh/id_rsa.pub node4  


Selanjutnya silahkan remote masing-masing node dari komputer master saja dengan cara login user hduser
 hduser@master:~> ssh hduser@node1  

Ulangi perintah diatas [X321] untuk Node1 - Node 4 !!


Selanjutnya masuk ke bagian konfigurasi pada sistem hadoop yang terdapat pada folder
master:/home/Hery # /opt/hadoop-1.2.0/conf/

Ubahlah file hadoop-env.sh pada baris
 # export JAVA_HOME=/usr/lib/j2sdk1.5-sun  

 ## menjadi   
 ## Hilangkan tanda pagar "# " dan rubah sesuai dengan   
 ## direktori aplikasi java yang diperoleh dari kegiatan whereis  
 export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk    

Selanjutnya kita mendefenisikan komputer yang menjadi master dan komputer yang menjadi slave pada kluster dengan mengedit file conf/masters dan conf/slaves
 ########### conf/masters ############  
 master  
 ########## conf/slaves ##############  
 node1  
 node2  
 node3  
 node4  

Kemudian pada folder conf, edit file-file yang berakhiran ekstension *-site.xml seperti berikut;

1. core-site.xml
 <configuration>  
     <property>  
         <name>fs.default.name</name>  
         <value>hdfs://master:54310</value>  
     </property>  
 </configuration>  

2. mapred-site.xml
 <configuration>  
     <property>  
         <name>mapred.job.tracker</name>  
         <value>master:54311</value>  
     </property>  
 </configuration>  

3. hdfs-site.xml
 <configuration>  
     <property>  
         <name>dfs.replication</name>  
         <value>4</value>  
     </property>  
 </configuration>  


Setelah semua konfigurasi diatas selesai di lakukan artinya proses installasi hadoop telah selesai, tahapan terakhir adalah menjalankan hadoop

1. Format distributed filesystem (HDFS) pada kluster:
Lakukan pada komputer master (salah satu komputer yang terdapat pada kluster)
 hduser@master:/> cd /opt/hadoop-1.2.0/  
 hduser@master:/opt/hadoop-1.2.0> bin/hadoop namenode -format  

2. Jalankan HDFS pada komputer master dengan perintah:
 hduser@master:/opt/hadoop-1.2.0> bin/start-dfs.sh  

3. Jalankan MapReduce pada komputer master dengan perintah:
 hduser@master:/opt/hadoop-1.2.0> bin/start-mapred.sh  

4. Silahkan buka pada browser http://master:50030 dan http://master:50070 untuk memonitoring Mapreduce Job dan HDFS

5. Testing
 hduser@master:/> cd /opt/hadoop-1.2.0/  
 hduser@master:/opt/hadoop-1.2.0> bin/hadoop jar hadoop-examples-1.2.0.jar pi 2 10  
 Number of Maps = 2  
 Samples per Map = 10  


Silahkan cek semua port yang berjalan yang digunakan oleh hadoop.

 hduser@master:/opt/hadoop-1.2.0> netstat -plten | grep java  
 (Not all processes could be identified, non-owned process info  
  will not be shown, you would have to be root to see it all.)  
 tcp    0   0 :::50030               :::*          LISTEN   1000    13100   3936/java  
 tcp    0   0 :::50070               :::*          LISTEN   1000    12600   3593/java  
 tcp    0   0 :::46102               :::*          LISTEN   1000    12309   3593/java  
 tcp    0   0 127.0.0.1:33943        :::*          LISTEN   1000    13217   4065/java  
 tcp    0   0 :::50010               :::*          LISTEN   1000    12895   3715/java  
 tcp    0   0 :::50075               :::*          LISTEN   1000    13032   3715/java  
 tcp    0   0 :::51075               :::*          LISTEN   1000    12686   3843/java  
 tcp    0   0 :::50020               :::*          LISTEN   1000    13082   3715/java  
 tcp    0   0 192.168.10.2:54310     :::*          LISTEN   1000    12493   3593/java  
 tcp    0   0 192.168.10.2:54311     :::*          LISTEN   1000    13093   3936/java  
 tcp    0   0 :::34503               :::*          LISTEN   1000    12650   3715/java  
 tcp    0   0 :::37577               :::*          LISTEN   1000    13071   3936/java  
 tcp    0   0 :::50090               :::*          LISTEN   1000    12896   3843/java  
 tcp    0   0 :::50060               :::*          LISTEN   1000    13478   4065/java  



##############################

Info:
Jika pada saat testing program muncul notifikasi gagal replication, kemungkinan firewall memblok proses HDFS terhadap kluster dijaringan, silahkan buka akses port yang digunakan mapreduce job pada firewall  

##############################

60 comments:

  1. Managing a business data is not an easy thing, it is very complex process to handle the corporate information both Hadoop and cognos doing this in a easy manner with help of business software suite, thanks for sharing this useful post….
    Regards,
    cognos Training in Chennai|cognos Training|cognos tm1 Training in Chennai

    ReplyDelete
  2. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging…
    Regards,
    SAP training in chennai|SAP course in chennai

    ReplyDelete
  3. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging…
    Regards,
    Best Informatica Training In Chennai|Informatica training center in Chennai|Informatica training chennai

    ReplyDelete
  4. Maharashtra Police Patil Recruitment 2016


    This is very interesting, Really great you are a very skilled blogger.............

    ReplyDelete
  5. Very interesting blog.Thanks for sharing.Informatica is a data integration/ETL tool that provides functionality for data transformation and loading of data. Informatica gets data from various sources and it loads the data into different targets. Informatica does not store, it just moves data from one place to another.

    Informatica training in Chennai

    ReplyDelete
  6. Great post.It was good and impressive.SAS (Statistical analysis system), the world's fastest and powerful statistical package for data analysis.
    Thanks,
    SAS Course in Chennai | SAS Institutes in Chennai

    ReplyDelete
  7. Hadoop is the most popular language.Initially hadoop is developed for large amount of data sets in OLAP environment. With introduction of Hbase on top of hadoop, can be used for OLAP Processing also. Hadoop is a framework with all the subcomponents like map reduce,hdfs,hbase,pig.
    Hadoop Training in Chennai | Hadoop course in Chennai | Hadoop Training institutes in Chennai

    ReplyDelete
  8. SAS training will prepare students for rewarding and very well paying career as SAS analyst, programmer, developer or consultant.
    SAS Training in Chennai | SAS Course in Chennai

    ReplyDelete
  9. I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.digital marketing training in chennai

    ReplyDelete
  10. Great post! I am actually getting ready to across this information, It’s very helpful for this blog.Also great with all of the valuable information you have Keep up the good work you are doing well.
    Digital Marketing online training

    full stack developer training in pune

    full stack developer training in annanagar

    full stack developer training in tambaram

    ReplyDelete
  11. Fantastic work! This is the type of information that should follow collective approximately the web. Embarrassment captivating position Google for not positioning this transmit higher! Enlarge taking place greater than and visit my web situate
    python training institute in chennai
    python training in Bangalore
    python training in pune

    ReplyDelete
  12. Thank you for allowing me to read it, welcome to the next in a recent article. And thanks for sharing the nice article, keep posting or updating news article.
    Blueprism training in tambaram

    Blueprism training in annanagar

    Blueprism training in velachery

    ReplyDelete
  13. This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me.. 

    Data Science training in kalyan nagar
    Data Science training in OMR
    selenium training in chennai
    Data Science with Python training in chenni
    Data science training in velachery

    ReplyDelete
  14. adding your RSS feed to my Google account. I look forward to fresh updates and will talk about this blog with my Facebook group. Chnebosh course in chennai
    at soon!

    ReplyDelete
  15. My favorite casino! purely play casino games I recommend to everyone!!! The engine does not slow down. I did not observe any breaks in the connection. Won $ 2200. Brought the winnings to no problem. Very beautiful, pleasant, comfortable casino. Very nice voice acting. Honesty control. In short .. all zashib""

    ReplyDelete
  16. I thank you for the information and articles you provided

    ReplyDelete
  17. This comment has been removed by the author.

    ReplyDelete
  18. Wonderful blog found to be very impressive to come across such an awesome blog. I should really appreciate the blogger for the efforts they have put in to develop such an amazing content for all the curious readers who are very keen of being updated across every corner. Ultimately, this is an awesome experience for the readers. Anyways, thanks a lot and keep sharing the content in future too.

    360DigiTMG Machine Learning Course

    ReplyDelete
  19. Good Post! it was so good to read and useful to improve my knowledge as an updated one, keep blogging. After seeing your article I want to say that also a well-written article with some very good information which is very useful for the readers....thanks for sharing it and do share more posts like this.

    Data Analytics Training in Gurgaon
    DATA ANALYTICS WITH R CERTIFICATION in Gurgaon

    ReplyDelete
  20. Thank you for your valuable content.very helpful for learners and professionals. You are doing very good job to share the useful information which will help to the students . if you are looking for Best Machine Learning Training in Gurgaon
    then Join iClass Gyanseyu

    ReplyDelete
  21. I want to say thanks to you. I have bookmark your site for future updates.
    data scientist certification malaysia

    ReplyDelete

  22. This post is so interactive and informative.keep update more information...
    RPA Training in Velachery
    RPA Training in Chennai

    ReplyDelete
  23. I have been searching to find a comfort or effective procedure to complete this process and I think this is the most suitable way to do it effectively.
    cyber security course in malaysia

    ReplyDelete
  24. Hi buddies, it is a great written piece entirely defined, continue the good work constantly.
    cyber security course malaysia

    ReplyDelete

  25. This post is so useful and informative. Keep updating with more information.....
    Software Testing Institute in Bangalore
    Software Testing Training in Bangalore

    ReplyDelete
  26. Best Wishes for New Job To Friend Poverty is pervasive throughout several places round the globe. If finding employment was straightforward, we are able to all settle for that feat a piece could be a cause for celebration in and of itself. All The Best For New Job Quotes

    ReplyDelete
  27. This is my first time i visit here and I found so many interesting stuff in your blog. Unlock your language potential with our comprehensive online English classes in Qatar with Ziyyara Edutech. Whether you're in Doha or anywhere else in the country.
    For more info visit Online Class for English language in Qatar

    ReplyDelete
  28. The given information was excellent and useful. This is one of the excellent blog, I have come across. Do share more. Excel in Class 12 with Ziyyara Edutech's exceptional online tuition services!
    Book A Free Demo Today visit online tuition for class 12

    ReplyDelete
  29. For the most current and accurate information on Kenya trade statistics, I recommend checking Import Globals. For more information about global import export data visit our website.
    Kenya Import Data

    ReplyDelete