Thursday, January 5, 2012

Error that occured in Hadoop and its sub-projects

1. OOZIE job failed:

Error message : ERROR is considered as FAILED for SLA
   
Cause 1 : Not able to find hadoop namenode (master), jobtracker machine.
Suppose you are running oozie, hadoop-master and job tracker on one machine  and datanode, tasktracker are running on another machine.

Your job.properties file contains following lines:
        nameNode=hdfs://localhost:9000
        jobTracker=localhost:9001
   
In above case, FS action will work fine because no map-reduce opertion is perform in FS action case. But, if you run map-reduce action then tasktracker will look hadoop-master on localhost machine becuase we have used localhost:9000 in job.properties file.
   
Solution : Used  IP of hadoop-namenode and jobtracker machine in job.properties file instead of localhost.   
   
Cause 2 : Oozie not able to find Mysql server.
Suppose I am using mysql as a metastore for hive.
Hive hive-default.xml file have following lines :
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
       
Solution : Use IP of mysql machine instead of localhost. 


2. Zookeeper server not running:
Error message: Could not find my address: zk-serevr1 in list of ZooKeeper quorum servers
   
Causes :
HBase tries to start a ZK server on some machine but that machine isn't able to find itself in the hbase.zookeeper.quorum configuration. This is a name lookup problem. 

Solution:   
Use the hostname presented in the error message instead of the value you used (zk-server1). If you have a DNS server, you can set hbase.zookeeper.dns.interface and hbase.zookeeper.dns.nameserver in hbase-site.xml to make sure it resolves to the correct FQDN.

3. Hadoop-datanode job failed or datanode not running: java.io.IOException: File ../mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
   
Cause 1: Make sure atleast one datanode is running.

Cause 2: namespaceID of master and slaves machines are not same.
If you see the error java.io.IOException: Incompatible namespaceIDs in the logs of a datanode , chances are you are affected by bug HADOOP-1212 (well, I’ve been affected by it at least).
           
Solution :               
If namespaceID of master and slaves machines are not same. Than replace the namespaceID of slaves machine with master namespaceID.
- dfs/name/current/VERSION file contains the namespaceID of master machine
- dfs/data/current/VERSION file contains the namespaceID of master machine
        
Cause 3: Datanode instance running out of space.
Solution : Free some space.

Cause 4 : You may also get this message due to permissions. May be JobTracker can not create jobtracker.info on startup.

4.    Sqoop export command failed:
Error message:
attempt_201101151840_1006_m_000001_0, Status : FAILED
java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at impressions_by_zip.__loadFromFields(impressions_by_zip.java:159)
at impressions_by_zip.parse(impressions_by_zip.java:108)

   
Cause : Given field separator is not valid
Solution : Specify correct field delimeter in sqoop export command.

5. HBase regionserver not running :

Error message: 2012-01-02 13:48:49,973 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Master rejected startup because clock is out of sync
org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server hadoop-datanode2,60020,1325492317440 has been rejected; Reported time is too far out of sync with master.  Time difference of 206141ms > max allowed of 30000ms

Solution: Clock of regionservers are not sync with master machine. Synchronized the clock of hbase master and regionserver machines.

5 comments:

Anonymous said...

/usr/local/hadoop/hadoop-0.20.203.0/bin/start-dfs.sh
starting namenode, logging to /usr/local/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-hduser-namenode-hadoop-ThinkCentre-A51.out
localhost: starting datanode, logging to /usr/local/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-hduser-datanode-hadoop-ThinkCentre-A51.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-hduser-secondarynamenode-hadoop-ThinkCentre-A51.out
hduser@hadoop-ThinkCentre-A51:~$ jps
12799 SecondaryNameNode
12837 Jps



plz hlp us to correct this error

Ankit Jain said...

Please share the namenode and datanode logs.

Anonymous said...

ACTION[0000017-120306175650364-oozie-oozi-W@mr-node] checking action, external ID [job_201203062014_0014] status [RUNNING]
2012-03-07 22:06:23,420 INFO CallbackServlet:525 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000017-120306175650364-oozie-oozi-W] ACTION[0000017-120306175650364-oozie-oozi-W@mr-node] callback for action [0000017-120306175650364-oozie-oozi-W@mr-node]
2012-03-07 22:06:23,577 INFO MapReduceActionExecutor:525 - USER[hdfs] GROUP[users] TOKEN[] APP[map-reduce-wf] JOB[0000017-120306175650364-oozie-oozi-W] ACTION[0000017-120306175650364-oozie-oozi-W@mr-node] action completed, external ID [job_201203062014_0014]
2012-03-07 22:06:23,625 WARN MapReduceActionExecutor:528 - USER[hdfs] GROUP[users] TOKEN[] APP[map-reduce-wf] JOB[0000017-120306175650364-oozie-oozi-W] ACTION[0000017-120306175650364-oozie-oozi-W@mr-node] LauncherMapper died, check Hadoop log for job [192.168.1.123:8021:job_201203062014_0014]
2012-03-07 22:06:23,803 INFO ActionEndCommand:525 - USER[hdfs] GROUP[users] TOKEN[] APP[map-reduce-wf] JOB[0000017-120306175650364-oozie-oozi-W] ACTION[0000017-120306175650364-oozie-oozi-W@mr-node] ERROR is considered as FAILED for SLA

hi i am avinash i installed oozie using tar ball and ran oozie job from hdfs user i got error
ERROR is considered as FAILED for SLA

can u help me

Ankit Jain said...

Hi Avinash,

Which oozie action you have run?? .. Look into the hadoop jobtracker log, may you get some clue.

Renata Ghisloti Duarte de Souza said...

Nice post. Thanks!

Post a Comment