About Me

My Photo
I am a Software Architect and Developer based in Bangalore, India. I have experience in building (scalable) applications using Java, JSP, JSF, JBoss Drools, Spring Framework, Hibernate, Ajax, JavaScript, MySQL, NoSQL (HBase, Project Voldemort). I am also a fan of Ruby on Rails, and have done some experimental work with it.

Thursday, September 27, 2012

Error while logging into EC2

If you are doing SSH into a newly created EC2 Instance, and if you are a newbie like me, you can get following error:

srikanth@ubuntu:~/AWS$ ssh -i Home-Ubuntu.pem ec2-user@ec2-174-129-93-173.compute-1.amazonaws.com
Permission denied (publickey).
To resolve it, you need to use userid@server-name in the SSH Command.

Only tricky part is that the userid can depend on the AMI you are using.  For example, if you use Amazon Linux AMI, the user id is "ec2-user", and if you use "Ubuntu AMI", the user id is "ubuntu".

One way to find out probably will be to trying to use "root@server-name", and then you will get an error like this, which will make it clear as to which user id one should use.

srikanth@ubuntu:~/AWS$ ssh -i Home-Ubuntu.pem root@ec2-174-129-93-173.compute-1.amazonaws.com
Please login as the user "ubuntu" rather than the user "root".

Saturday, August 06, 2011

Configuring LZO Compression for CDH3 HBase

This post explains steps for "Configuring LZO Compression for CDH3 HBase" and it applies to CDH3 HBase 0.90.1 and above.   Main source of this post is http://wiki.apache.org/hadoop/UsingLzoCompression, I have added details of steps that are needed for CDH3 (step 3) as some instructions in the wiki page did not seem to apply applicable for CDH3 HBase.

I tried below steps on Ubuntu.  If you are using Linux, give Todd Lipcon's https://github.com/toddlipcon/hadoop-lzo-packager a try.

Also, adding LZO Compression may require additional configuration changes in your HBase setup.  Refer http://search-hadoop.com/m/WUnLM6ojHm1/Long+client+pauses+with+compression&subj=Long+client+pauses+with+compression
  1. Download and build http://www.oberhumer.com/opensource/lzo/. This steps are  based on http://www.linuxfromscratch.org/blfs/view/cvs/general/LZO.html.
    • wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.05.tar.gz
    • gunzip lzo-2.05.tar.gz
    • tar -xvf lzo-2.05.tar
    • cd lzo-2.05/
    • ./configure --prefix=/usr --enable-shared
    • make
    • make check
      • All checks should pass.
    • make test 
      • All tests should pass.
    • sudo make install
  2. Download and build http://code.google.com/p/hadoop-gpl-compression/ 
    • Copy the Hadoop GPL Compression JAR to HBase lib
      • cp build/hadoop-gpl-compression-0.2.0-dev.jar /path/to/hbase/lib/
      • Copy the native files to /usr/local/lib/
        •  sudo cp build/native/Linux-amd64-64/lib/* /usr/local/lib/
    • Edit /path/to/hbase/conf/hbase-env.sh, and add /usr/local/lib to environment variable.  This is needed as hbase/lib does not seem to have "native" folder anymore (Refer:  http://cdh3u0.cloudera.com/cdh/3/hbase-0.90.1+15.18.releasenotes.html and https://issues.apache.org/jira/browse/HBASE-3533)
      • vi /path/to/hbase/conf/hbase-env.sh
      • Add following line at the end:
        • export HBASE_LIBRARY_PATH=/usr/local/lib/
    • Start the HBase.  You should see that native lzo library is successfully loaded in logs.  I was trying in stand alone mode, need to confirm whether below logs will be seen in master or region server logs.
      • 2011-08-06 14:54:08,776 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl library
      • 2011-08-06 14:54:08,782 INFO com.hadoop.compression.lzo.LzoCodec: Successfully loaded & initialized native-lzo library

    Using your own log4j.properties with GridGain

    By default, GridGain comes with its own copy of log4j configuration in the form of "default-log4j.xml" GridGain does not allow usage of log4j.properties as GridLog4jLogger only accepts XML file as source of configuration.

    However, like me, if you are adding GridGain to an application that has logging already configured using log4j.properties, then, you can use below workaround.

    In the below example, we are setting gridLogger property of Grid Configuration by using an
    instance of GridLogger that is constructed using a regular log4j Logger object.

    Then, we have log4jInitialization bean that initialzes log4j using
    log4j.properties with the help of org.springframework.util.Log4jConfigurer.

    Sunday, July 17, 2011

    Power of Ruby and HBase shell

    Well, its a well known fact that Hbase shell is a Ruby IRB shell with lots of HBase convenience method loaded into it.

    I had a requirement for my project wherein I wanted to check whether a row, identified by a given row key, exists in table or not.  Though "get" kind of works, but it does not return true or false, instead dumps the output to standard out.

    However, I wanted to read a file which consisted of a set of row keys and find out whether those row keys were present in the table or not.  Since, "get" would not return a boolean, I was not able to build a logic around it.

    So, I started looking into ruby code of HBase shell, and came up with following.  Looks like neat little hack.  But it also demonstrates the opportunities at offer to write really nice shell utilities for your project's needs.



    Sunday, June 26, 2011

    HBase Benchmarking for multi-threaded environment

    This weekend I attempted to figure out how HBase writes perform in multi-threaded environments.  To do so,  I wrote a Java program that writes records in a HBase table, and which can be configured to run to write N number of records using M Threads. 

    I used three variants to write records into HBase:
    1. Use one HTable for every write
      1. HTable is created using a singleton instance of HBaseConfiguration
      2. HTable is created using new instance of HBaseConfiguration (Why? - I wanted to check how the behavior changes if connections to servers are not shared.  Reference: HTable and HConnectionManager documentation)
    2. Use HTablePool

    Here are the various results:

    HBase 0.20.6 Standalone mode  (all figures in milliseconds)
    Scenario Num Threads Puts Mean Standard Deviation Max Min     80th Percentile 90th Percentile
    HTable from HTablePool (pool size: 100) 1 100 4.86 1.49 12.00 3.00 5.00 6.90
    HTable with same configuration instance 1 100 4.52 1.12 9.00 3.00 5.00 6.90
    HTable with new Hbase configuration instance 1 100 4.15 0.88 8.00 3.00 4.00 5.00
    HTable from HTablePool (pool size: 100) 1 1000 4.19 3.60 108.00 3.00 4.00 5.00
    HTable with same configuration instance 1 1000 4.02 1.79 47.00 3.00 4.00 5.00
    HTable with new Hbase configuration instance 1 1000 3.84 3.57 108.00 3.00 4.00 4.00
    HTable from HTablePool (pool size: 100) 1 10000 3.95 4.13 211.00 2.00 4.00 5.00
    HTable with same configuration instance 1 10000 4.28 10.02 571.00 2.00 4.00 5.00
    HTable with new Hbase configuration instance 1 10000 3.80 5.46 244.00 2.00 4.00 4.00
    HTable from HTablePool (pool size: 100) 10 100 24.59 22.73 204.00 4.00 38.60 44.00
    HTable with same configuration instance 10 100 20.48 28.05 233.00 5.00 22.00 36.80
    HTable with new Hbase configuration instance 10 100 21.44 28.05 241.00 5.00 26.80 50.90
    HTable from HTablePool (pool size: 100) 10 1000 26.10 21.94 246.00 5.00 40.00 49.00
    HTable with same configuration instance 10 1000 23.80 25.31 238.00 3.00 35.80 44.00
    HTable with new Hbase configuration instance 10 1000 22.55 21.96 209.00 4.00 35.00 44.00
    HTable from HTablePool (pool size: 100) 10 10000 22.06 18.81 279.00 3.00 36.00 43.00
    HTable with same configuration instance 10 10000 22.67 18.44 287.00 3.00 37.00 45.00
    HTable with new Hbase configuration instance 10 10000 24.43 26.67 507.00 3.00 38.00 46.00
    HTable from HTablePool (pool size: 100) 100 100 106.45 68.53 422.00 5.00 164.60 199.40
    HTable with same configuration instance 100 100 102.81 68.45 421.00 4.00 167.80 194.60
    HTable with new Hbase configuration instance 100 100 25.73 20.58 209.00 4.00 29.80 35.90
    HTable from HTablePool (pool size: 100) 100 1000 195.38 189.46 696.00 3.00 375.80 488.00
    HTable with same configuration instance 100 1000 189.86 193.69 688.00 3.00 384.00 492.00
    HTable with new Hbase configuration instance 100 1000 79.80 110.55 525.00 4.00 151.00 252.80
    HTable from HTablePool (pool size: 100) 100 10000 219.65 214.10 1277.00 3.00 433.00 543.00
    HTable with same configuration instance 100 10000 277.60 473.36 4129.00 3.00 453.80 609.00
    HTable with new Hbase configuration instance 100 10000 214.80 230.87 1386.00 3.00 435.00 566.00


    HBase 0.20.6 Distributed Mode with 3 Region Servers (Hadoop 0.20) (all figures in milliseconds)
    Scenario Num Threads Puts Mean Standard Deviation Max Min     80th Percentile 90th Percentile
    HTable from HTablePool (pool size: 100) 1 100 3.25 1.96 18.00 2.00 3.00 3.90
    HTable with same configuration instance 1 100 2.98 1.18 9.00 2.00 3.00 4.00
    HTable with new Hbase configuration instance 1 100 2.91 1.43 10.00 1.00 3.00 4.90
    HTable from HTablePool (pool size: 100) 1 1000 4.56 3.77 40.00 1.00 6.00 11.00
    HTable with same configuration instance 1 1000 4.98 4.04 20.00 1.00 11.00 11.00
    HTable with new Hbase configuration instance 1 1000 2.28 1.20 9.00 1.00 3.00 3.00
    HTable from HTablePool (pool size: 100) 1 10000 2.48 1.76 41.00 1.00 3.00 3.00
    HTable with same configuration instance 1 10000 2.43 1.78 49.00 1.00 3.00 3.00
    HTable with new Hbase configuration instance 1 10000 2.84 2.79 52.00 1.00 3.00 5.00
    HTable from HTablePool (pool size: 100) 10 100 15.88 21.62 204.00 2.00 23.80 32.90
    HTable with same configuration instance 10 100 13.09 20.50 204.00 2.00 18.00 20.00
    HTable with new Hbase configuration instance 10 100 8.78 19.64 197.00 2.00 8.80 13.00
    HTable from HTablePool (pool size: 100) 10 1000 12.36 10.76 203.00 2.00 19.00 22.00
    HTable with same configuration instance 10 1000 12.39 11.49 202.00 2.00 20.00 26.00
    HTable with new Hbase configuration instance 10 1000 9.13 10.79 204.00 1.00 14.00 17.90
    HTable from HTablePool (pool size: 100) 10 10000 32.41 59.62 1785.00 1.00 50.00 80.00
    HTable with same configuration instance 10 10000 37.87 143.47 4911.00 2.00 51.00 84.00
    HTable with new Hbase configuration instance 10 10000 12.93 11.26 212.00 2.00 20.00 24.00
    HTable from HTablePool (pool size: 100) 100 100 296.71 132.66 710.00 47.00 404.80 467.50
    HTable with same configuration instance 100 100 263.60 135.97 738.00 16.00 389.40 411.80
    HTable with new Hbase configuration instance 100 100 7.11 8.33 48.00 2.00 10.00 12.00
    HTable from HTablePool (pool size: 100) 100 1000 108.58 105.80 419.00 2.00 215.00 268.00
    HTable with same configuration instance 100 1000 107.31 105.14 362.00 2.00 218.00 267.00
    HTable with new Hbase configuration instance 100 1000 10.24 13.45 203.00 1.00 11.00 20.00
    HTable from HTablePool (pool size: 100) 100 10000 149.96 197.28 1527.00 1.00 267.00 333.00
    HTable with same configuration instance 100 10000 134.47 144.55 646.00 2.00 276.00 358.00
    HTable with new Hbase configuration instance 100 10000 247.84 692.83 11043.00 1.00 370.00 668.00


    HBase 0.90.1 CDH3 Standalone installation (all figures in milliseconds)
    Scenario Num Threads Puts Mean Stdev Max Min 80 Pctl 90 Pctl
    HTable from HTablePool (pool size: 100) 1 100 6.54 4.24 38.00 4.00 7.00 8.00
    HTable with same configuration instance 1 100 5.77 1.34 10.00 4.00 7.00 8.00
    HTable with new Hbase configuration instance 1 100 5.23 1.33 12.00 4.00 5.00 6.90
    HTable from HTablePool (pool size: 100) 1 1000 5.50 1.62 26.00 4.00 6.00 7.00
    HTable with same configuration instance 1 1000 4.95 1.35 22.00 3.00 6.00 6.00
    HTable with new Hbase configuration instance 1 1000 4.94 1.31 16.00 3.00 6.00 6.00
    HTable from HTablePool (pool size: 100) 1 10000 4.84 2.57 92.00 3.00 5.00 6.00
    HTable with same configuration instance 1 10000 4.59 4.88 229.00 3.00 5.00 6.00
    HTable with new Hbase configuration instance 1 10000 4.30 4.72 231.00 3.00 4.00 5.00
    HTable from HTablePool (pool size: 100) 10 100 26.44 27.85 205.00 4.00 34.80 39.00
    HTable with same configuration instance 10 100 27.50 28.68 214.00 7.00 35.00 40.00
    HTable with new Hbase configuration instance 10 100 31.55 31.22 205.00 4.00 37.00 70.40
    HTable from HTablePool (pool size: 100) 10 1000 25.80 14.84 204.00 3.00 36.00 41.00
    HTable with same configuration instance 10 1000 26.15 15.21 212.00 3.00 37.00 41.90
    HTable with new Hbase configuration instance 10 1000 25.72 14.31 199.00 3.00 37.00 43.00
    HTable from HTablePool (pool size: 100) 10 10000 26.12 16.09 280.00 3.00 36.00 41.00
    HTable with same configuration instance 10 10000 27.45 16.93 274.00 3.00 38.00 43.00
    HTable with new Hbase configuration instance 10 10000 25.57 13.90 263.00 3.00 36.00 40.00
    HTable from HTablePool (pool size: 100) 100 100 22.55 17.28 50.00 5.00 44.80 46.90
    HTable with same configuration instance 100 100 132.36 77.87 481.00 18.00 197.80 241.30
    HTable with new Hbase configuration instance 100 100 29.23 22.76 97.00 4.00 52.60 59.90
    HTable from HTablePool (pool size: 100) 100 1000 175.42 272.73 2987.00 3.00 307.60 356.00
    HTable with same configuration instance 100 1000 19.54 47.49 248.00 3.00 9.00 11.00
    HTable with new Hbase configuration instance 100 1000 52.58 69.17 383.00 4.00 79.80 150.90
    HTable from HTablePool (pool size: 100) 100 10000 173.08 1294.93 24606.00 4.00 117.00 148.00
    HTable with same configuration instance 100 10000 57.95 43.75 414.00 4.00 93.00 113.00
    HTable with new Hbase configuration instance 100 10000 180.96 163.70 937.00 4.00 334.00 424.00

    I will be adding more results here: https://github.com/srikanthps/hbase-benchmarker/blob/master/Reports.TXT


    Conclusions:
    So, what conclusions can be drawn from above figures:

    1. Write performances seems to be good when lesser threads are used for writing.
    2.  All 3 variants seems to perform more or less similarly.  In our application, we use HTablePool based approach and its nice to see that it performs equally nice.  Another thing which I observed during my tests is that pool size may not have much role to play.  With pool size of 10 too, the results are more or less similar.
    3. As number of threads increase, the write performance deteriorates.  A potential reason for this behavior is that HBase opens only one connection per region server from a single JVM.  So, will increasing the number of JVMs help in horizontally scaling the writes?

    Under heavy multi-threading, the avg. time increases as most of the threads seems to be blocking here:
    "pool-1-thread-99" prio=6 tid=0x045a4c00 nid=0x22e4 waiting for monitor entry [0x06c6f000]
       java.lang.Thread.State: BLOCKED (on object monitor)
     at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:471) - waiting to lock <0x291f0088> (a java.io.DataOutputStream)
     at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:713)
     at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
     at $Proxy0.put(Unknown Source)
     at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3$1.call(HConnectionManager.java:1242)
     at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3$1.call(HConnectionManager.java:1240)
     at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1035)
     at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall(HConnectionManager.java:1239)
     at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1161)
     at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1247)
     at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609)
     at org.apache.hadoop.hbase.client.HTable.put(HTable.java:474)



    Source code for benchmarking application:
    Available in my github repository.
    https://github.com/srikanthps/hbase-benchmarker

    HBase Issue - HMaster not starting with error "Cannot assign requested address"

    Today I ran into an issue with my working HBase standalone node after my Ubuntu PC got a new IP address due to restart.

    HBase master logs showed:
    2011-06-26 17:38:13,125 INFO org.apache.hadoop.hbase.master.HMaster: My address is ubuntu.ubuntu-domain:60000
    2011-06-26 17:38:13,250 ERROR org.apache.hadoop.hbase.master.HMaster: Can not start master
    java.net.BindException: Problem binding to /192.168.1.2:60000 : Cannot assign requested address
        at org.apache.hadoop.hbase.ipc.HBaseServer.bind(HBaseServer.java:179)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.(HBaseServer.java:242)


    I noticed that my current IP address was 192.168.1.3, not 1.2 which HBase was trying to bind to.

    Ping to my host name showed IP address which HBase was trying to use, but "ifconfig" showed the new IP Address:

    srikanth@ubuntu:~/hadoop/hbase-0.20.6$ ping ubuntu.ubuntu-domain
    PING ubuntu.ubuntu-domain (192.168.1.2) 56(84) bytes of data.
    64 bytes from ubuntu.ubuntu-domain (192.168.1.2): icmp_seq=1 ttl=64 time=12.2 ms
    64 bytes from ubuntu.ubuntu-domain (192.168.1.2): icmp_seq=2 ttl=64 time=1.17 ms

    srikanth@ubuntu:~$ ifconfig
    eth0      Link encap:Ethernet  HWaddr 00:1c:c0:4d:74:86 
              inet addr:192.168.1.3  Bcast:192.168.1.255  Mask:255.255.255.0
              inet6 addr: fe80::21c:c0ff:fe4d:7486/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:6518 errors:0 dropped:0 overruns:0 frame:0
              TX packets:6501 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:100
              RX bytes:6421057 (6.4 MB)  TX bytes:865876 (865.8 KB)
              Memory:e0400000-e0420000

    lo        Link encap:Local Loopback 
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:421 errors:0 dropped:0 overruns:0 frame:0
              TX packets:421 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:52862 (52.8 KB)  TX bytes:52862 (52.8 KB)


    I tried following after some googling but had no success:
    1. Disabled IPv6
    2. Try to clear the DNS Cache.  And in this process, also learned that Ubuntu does not do any DNS caching.
    3. Tried to assign static IP Address so that HBase is happy, though this worked for HBase, but my machine went out of network (no internet, and my laptop could not SSH to it).
    4. Called up my friend who is Linux expert.


    Later I decided to update /etc/hosts to map my host name to my new IP address.  It was only then did I realize that it already had an entry that was pointing my host name to my previous IP address which was causing all the mess.  I updated it to use my current IP address, and things started working fine again.  Phew!!







    Saturday, June 18, 2011

    Ruby Application to tweet messages using twitter

    Using twitter_oauth gem, I wrote an application that can be used in non-web-based application to integrate twitter. Using the concepts demonstrated in this sample application, one can generate tweets on whenever an interesting application events occurs. For example, a shopping site can tweet about a new category of products that gets available.

    I will try to make it into a gem soon, and may be rails plugin too, something like "ActionTweeter" similar to "ActionMailer".

    Here is the GIST URL: https://gist.github.com/1033288

    Monday, November 15, 2010

    98th Thing every programmer should know

    If you provide a code review comment, follow it up with other developer to confirm whether it was implemented or not. Many a times, a comment gets lost in email chain and never gets implemented.

    It is better if you can use a code review tool that integrates well with the IDE.

    99th thing every programmer should know

    Avoid unnecessary acronmym-ization of words.

    Eg: This is the name of the constant CHART_DTL_MAIN some one used in the code.

    Here "DTL" is actually an acronmy invented by programmer for the word "DETAIL". By saving 3 characters, programmer has made variable not easy to interpret.

    It's better to avoid using unnecssary acronyms and use as descriptive words as possible as there are typically no restrictions in the language.

    Wednesday, September 29, 2010

    Ruby Frameworks that I wish to learn more about..

    Paperclip
    Activemerchant
    Attachment_fu
    cancan
    factory girl
    arel
    radiant cms
    devise
    authlogic
    mongomapper