1. OOZIE job failed:
Error message : ERROR is considered as FAILED for SLA
Your job.properties file contains following lines:
nameNode=hdfs://localhost:9000
jobTracker=localhost:9001
Solution : Used IP of hadoop-namenode and jobtracker machine in job.properties file instead of localhost.
Cause 2 : Oozie not able to find Mysql server.
Suppose I am using mysql as a metastore for hive.
Hive hive-default.xml file have following lines :
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
Solution : Use IP of mysql machine instead of localhost.
2. Zookeeper server not running:
Error message: Could not find my address: zk-serevr1 in list of ZooKeeper quorum servers
Causes :
HBase tries to start a ZK server on some machine but that machine isn't able to find itself in the hbase.zookeeper.quorum configuration. This is a name lookup problem.
Solution:
Use the hostname presented in the error message instead of the value you used (zk-server1). If you have a DNS server, you can set hbase.zookeeper.dns.interface and hbase.zookeeper.dns.nameserver in hbase-site.xml to make sure it resolves to the correct FQDN.
3. Hadoop-datanode job failed or datanode not running: java.io.IOException: File ../mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
Cause 1: Make sure atleast one datanode is running.
Cause 2: namespaceID of master and slaves machines are not same.
If you see the error java.io.IOException: Incompatible namespaceIDs in the logs of a datanode , chances are you are affected by bug HADOOP-1212 (well, I’ve been affected by it at least).
Solution :
If namespaceID of master and slaves machines are not same. Than replace the namespaceID of slaves machine with master namespaceID.
- dfs/name/current/VERSION file contains the namespaceID of master machine
- dfs/data/current/VERSION file contains the namespaceID of master machine
Cause 3: Datanode instance running out of space.
Solution : Free some space.
Cause 4 : You may also get this message due to permissions. May be JobTracker can not create jobtracker.info on startup.
4. Sqoop export command failed:
Error message:
attempt_201101151840_1006_m_000001_0, Status : FAILED
java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at impressions_by_zip.__loadFromFields(impressions_by_zip.java:159)
at impressions_by_zip.parse(impressions_by_zip.java:108)
Cause : Given field separator is not valid
Solution : Specify correct field delimeter in sqoop export command.
5. HBase regionserver not running :
Error message: 2012-01-02 13:48:49,973 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Master rejected startup because clock is out of sync
org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server hadoop-datanode2,60020,1325492317440 has been rejected; Reported time is too far out of sync with master. Time difference of 206141ms > max allowed of 30000ms
Solution: Clock of regionservers are not sync with master machine. Synchronized the clock of hbase master and regionserver machines.
Error message : ERROR is considered as FAILED for SLA
Cause 1 : Not able to find hadoop namenode (master), jobtracker machine.
Suppose you are running oozie, hadoop-master and job tracker on one machine and datanode, tasktracker are running on another machine.
Your job.properties file contains following lines:
jobTracker=localhost:9001
In above case, FS action will work fine because no map-reduce opertion is perform in FS action case. But, if you run map-reduce action then tasktracker will look hadoop-master on localhost machine becuase we have used localhost:9000 in job.properties file.
Solution : Used IP of hadoop-namenode and jobtracker machine in job.properties file instead of localhost.
Cause 2 : Oozie not able to find Mysql server.
Suppose I am using mysql as a metastore for hive.
Hive hive-default.xml file have following lines :
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
Solution : Use IP of mysql machine instead of localhost.
2. Zookeeper server not running:
Error message: Could not find my address: zk-serevr1 in list of ZooKeeper quorum servers
Causes :
HBase tries to start a ZK server on some machine but that machine isn't able to find itself in the hbase.zookeeper.quorum configuration. This is a name lookup problem.
Solution:
Use the hostname presented in the error message instead of the value you used (zk-server1). If you have a DNS server, you can set hbase.zookeeper.dns.interface and hbase.zookeeper.dns.nameserver in hbase-site.xml to make sure it resolves to the correct FQDN.
3. Hadoop-datanode job failed or datanode not running: java.io.IOException: File ../mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
Cause 1: Make sure atleast one datanode is running.
Cause 2: namespaceID of master and slaves machines are not same.
If you see the error java.io.IOException: Incompatible namespaceIDs in the logs of a datanode , chances are you are affected by bug HADOOP-1212 (well, I’ve been affected by it at least).
Solution :
If namespaceID of master and slaves machines are not same. Than replace the namespaceID of slaves machine with master namespaceID.
- dfs/name/current/VERSION file contains the namespaceID of master machine
- dfs/data/current/VERSION file contains the namespaceID of master machine
Cause 3: Datanode instance running out of space.
Solution : Free some space.
Cause 4 : You may also get this message due to permissions. May be JobTracker can not create jobtracker.info on startup.
4. Sqoop export command failed:
Error message:
attempt_201101151840_1006_m_000001_0, Status : FAILED
java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at impressions_by_zip.__loadFromFields(impressions_by_zip.java:159)
at impressions_by_zip.parse(impressions_by_zip.java:108)
Cause : Given field separator is not valid
Solution : Specify correct field delimeter in sqoop export command.
5. HBase regionserver not running :
Error message: 2012-01-02 13:48:49,973 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Master rejected startup because clock is out of sync
org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server hadoop-datanode2,60020,1325492317440 has been rejected; Reported time is too far out of sync with master. Time difference of 206141ms > max allowed of 30000ms
Solution: Clock of regionservers are not sync with master machine. Synchronized the clock of hbase master and regionserver machines.
5 comments:
/usr/local/hadoop/hadoop-0.20.203.0/bin/start-dfs.sh
starting namenode, logging to /usr/local/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-hduser-namenode-hadoop-ThinkCentre-A51.out
localhost: starting datanode, logging to /usr/local/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-hduser-datanode-hadoop-ThinkCentre-A51.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-hduser-secondarynamenode-hadoop-ThinkCentre-A51.out
hduser@hadoop-ThinkCentre-A51:~$ jps
12799 SecondaryNameNode
12837 Jps
plz hlp us to correct this error
Please share the namenode and datanode logs.
ACTION[0000017-120306175650364-oozie-oozi-W@mr-node] checking action, external ID [job_201203062014_0014] status [RUNNING]
2012-03-07 22:06:23,420 INFO CallbackServlet:525 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000017-120306175650364-oozie-oozi-W] ACTION[0000017-120306175650364-oozie-oozi-W@mr-node] callback for action [0000017-120306175650364-oozie-oozi-W@mr-node]
2012-03-07 22:06:23,577 INFO MapReduceActionExecutor:525 - USER[hdfs] GROUP[users] TOKEN[] APP[map-reduce-wf] JOB[0000017-120306175650364-oozie-oozi-W] ACTION[0000017-120306175650364-oozie-oozi-W@mr-node] action completed, external ID [job_201203062014_0014]
2012-03-07 22:06:23,625 WARN MapReduceActionExecutor:528 - USER[hdfs] GROUP[users] TOKEN[] APP[map-reduce-wf] JOB[0000017-120306175650364-oozie-oozi-W] ACTION[0000017-120306175650364-oozie-oozi-W@mr-node] LauncherMapper died, check Hadoop log for job [192.168.1.123:8021:job_201203062014_0014]
2012-03-07 22:06:23,803 INFO ActionEndCommand:525 - USER[hdfs] GROUP[users] TOKEN[] APP[map-reduce-wf] JOB[0000017-120306175650364-oozie-oozi-W] ACTION[0000017-120306175650364-oozie-oozi-W@mr-node] ERROR is considered as FAILED for SLA
hi i am avinash i installed oozie using tar ball and ran oozie job from hdfs user i got error
ERROR is considered as FAILED for SLA
can u help me
Hi Avinash,
Which oozie action you have run?? .. Look into the hadoop jobtracker log, may you get some clue.
Nice post. Thanks!
Post a Comment