I am trying to setup a big data cluster by myself. The plan is to setup a HDFS and hadoop cluster (2 machines) and play with it a little bit and then go to more softwares like Spark. The setup was from scratch!!

  1. Hadoop and HDFS setup [Mar-07]
    • Install JDK and download hadoop tar file from apache
    • Create user:hduser and group hadoop
    • Untar the hadoop tarball to /usr/local and make soft links to /usr/local/hadoop
    • Setup ssh and copy your public key to authroized_keys file under .ssh directory
  2. Cluster configuration [Mar-08]

Right now these configuration looks easy to me. But I just don’t have energy to actually try it. I will try sometime this week. Downloaded hadoop project’s source code.

  • Protobuf is really basic to be used in remote procedure calls and msg communication. Need to master that. -