Hadoop Common Errors with Possible Solution | Hadoop Tutorial pdf

Errors:

Hadoop Common Errors with Possible Solution                            
Here I’m writing some of the Hadoop Issue faced by me and providing the solution with it hope you all get the benefit from it.
Hadoop cluster namenode formatted (bin / hadoop namenode -format)
restart cluster will appear as follows

1. After formatted name node (bin/hadoop namenode - format), Its come up with namespace Error,
ERROR: Incompatible namespaceIDS in ...: namenode namespaceID = ...,
datanode namespaceID = ...
Error because the format namenode will re-create a new namespaceID, so
that the original and datanode inconsistent.
Solution:
        1. Data files deleted the datanode dfs.data.dir directory (default is tmp /
             dfs / data)
        2. Modify dfs.data.dir / current / VERSION file the namespaceID and
            namenode identical to (log errors where there will be prompt)
        3. To reassign new dfs.data.dir directory

2. Hadoop cluster is started with start-all.sh, slave always fail to start datanode, and will get an error:
ERROR: Could only be replicated to 0 nodes, instead of 1
Is the node identification may be repeated (personally think the wrong reasons). There may also be other reasons, and what solution then tries to solve.
Solution:
      1. If port access, you should make sure the port is open, such as hdfs :/
          / machine1: 9000 / 50030,50070 like. Executive # iptables-I INPUT-p tcp-dport 9000-j ACCEPT command. If there is an error:
          hdfs.DFSClient: Exception in createBlockOutputStream
          java.net.ConnectException: Connection refused in; datanode port can not access, modify iptables: # iptables-I INPUT-s machine1-p tcp-j datanode on ACCEPT
     2. There may be firewall restrictions between clusters to communicate with each other. Try to turn off the firewall. / Etc / init.d / iptables stop
      3. Finally, there may be not enough disk space, check df -al

3. The program execution
Error: java.lang.NullPointerException
Null pointer exception, to ensure that the correct java program. Instantiated before the use of the variable what statement do not like array out of bounds. Inspection procedures.
When the implementation of the program, (various) error, make sure that the situation:
        1. Premise of your program is correct by compiled
        2. Cluster mode, the data to be processed wrote HDFS path and ensure correct
        3. Specify the execution of jar package the entrance class name (I do not know why sometimes you do not specify also can run)
            The correct wording similar to this:
            $ hadoop jar myCount.jar myCount input output

4. Hadoop start datanode
ERROR: Unrecognized option:-jvm Could not the create the Java virtual machine.
Hadoop installation directory / bin / hadoop following piece of shell:
CLASS = 'org.apache.hadoop.hdfs.server.datanode.DataNode'
if [[$ EUID-eq 0]]; then
HADOOP_OPTS = "$ HADOOP_OPTS-jvm server $
HADOOP_DATANODE_OPTS"
else
HADOOP_OPTS = "$ HADOOP_OPTS-server $
HADOOP_DATANODE_OPTS"
if
$ EUID user ID, if it is the root of this identification will be 0, so try not to
use the root user to operate hadoop .

5.Terminal error message:
ERROR hdfs.DFSClient: Exception closing file / user / hadoop /
musicdata.txt: java.io.IOException: All datanodes 10.210.70.82:50010 are bad. Aborting ...
There are the jobtracker logs the error information
Error register getProtocolVersion
java.lang.IllegalArgumentException: Duplicate metricsName:
getProtocolVersion
And possible warning information:
WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Broken pipe
WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_3136320110992216802_1063java.io.IOException: Connection reset by peer
WARN hdfs.DFSClient: Error Recovery for block
blk_3136320110992216802_1063 bad datanode [0] 10.210.70.82:50010
put: All datanodes 10.210.70.82:50010 are bad. Aborting ...
The solution:
           1. Path of under the dfs.data.dir properties of whether the disk is full, try hadoop fs -put data if the processing is full again.
           2. Related disk is not full, you need to troubleshoot related disk has no bad sectors, need to be detected.

6.Hadoop jar program get the error message:
java.io.IOException: Type mismatch in key from map: expected
org.apache.hadoop.io.NullWritable, recieved
org.apache.hadoop.io.LongWritable
Or something like this:
Status: FAILED java.lang.ClassCastException:
org.apache.hadoop.io.LongWritable cannot be cast to
org.apache.hadoop.io.Text
Then you need to learn the basics of Hadoop and map reduce model. In
"hadoop Definitive Guide book” in Chapter Hadoop I / O and in Chapter VII,
MapReduce type and format. If you are eager to solve this problem, I can also tell you a quick solution, but this is bound to affect you later development:
Ensure consistent data:
... Extends Mapper ...
public void map (k1 k, v1 v, OutputCollector output) ...
...
... Extends Reducer ...
public void reduce (k2 k, v2 v, OutputCollector output) ...
...
job.setMapOutputKeyClass (k2.class);
job.setMapOutputValueClass (k2.class);
job.setOutputKeyClass (k3.class);
job.setOutputValueClass (v3.class);
...
Note that the corresponding k * and v *. Recommendations or two chapters
I just said. Know the details of its principles.

7. If you hit a datanode error as follows:
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
java.io.IOException: Cannot lock storage / data1/hadoop_data. The directory is already locked.
According to the error prompts view, it is the directory locked, unable to read. At this time you need to look at whether there are related process is still running or slave machine hadoop process is still running, use the linux
command to view:
Netstat -nap
ps-aux | grep Related PID
If hadoop related process is still running, use the kill command to kill can.
And then re-use start-all.sh.

8. If you encounter the jobtracker error follows:
ERROR: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailingout.
Solution, modify datanode node /etc/hosts file.
Hosts under brief format:
Each line is divided into three parts: the first part of the network IP address, the second part of the host name or domain name, the third part of the host alias detailed steps are as follows:
1.first check the host name:
$ echo –e “ `hostname - i ` \t `hostname -n` \t $stn ”
Stn= short name or alies of hostname.
It will result in something like that
10.200.187.77 hadoop-datanode DN
If the IP address is configured on successfully modified, or show host name there is a problem, continue to modify the hosts file,
The shuffle error still appears this problem, then try to modify the configuration file of another user said hdfs-site.xml file, add the following:
dfs.http.address
*. *. *: 50070 The ports do not change, instead of the asterisk IP hadoop information transfer through HTTP, the port is same.

9.If you encounter the jobtracker error follows:
ERROR: java.lang.RuntimeException: PipeMapRed.waitOutputThreads ():
subprocess failed with code *
This is a java throws the system returns an error code, the meaning of the error code indicates details.
Please Excuse my typos and please share comment if you feel anything i left out.

10. If you encounter the following error:
FAILED java.lang.IllegalArgumentException: java.net.URISyntaxException:
Relative path in absolute URI: ***
URI inside the characters not allowed, such as the colon: the class, the operating system does not allow the file named characters. In detail according to the prompt part (part asterisk) , Elimination of illegal character
can solve this issue.

11. To encounter tasktracker not start, tasktracker log error as follows:
ERROR org.apache.hadoop.mapred.TaskTracker: Cannot start task tracker because java.net.BindException: Address already in use ***
Ports are occupied or have corresponding process starts when you first stop the cluster, and then use ps-aux | grep hadoop command to look at the related process of hadoop , And Kill hadoop daemon .

12. Encounter datanode not started, the datanode log error is as follows:
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
java.io.IOException: No locks available
No locks available can mean that you are trying to use hadoop on a filesystem that does not support file level locking.\

13. Are you trying to run your name node storage in NFS space?
Mentioned file-level locking, use
$ / Etc / init.d / nfs status
Command to view network file system, are closed. Another command df-Th
or mount the type of the file system, you can view the results obtained is indeed the NFS file system. Hanging in the network file system can not be used, because seemingly read-only, if not read-only situation, like said
above, does not support the file-level locking.
Finally the solution, you can try to add a file-level locking to nfs. I am here is to modify the dfs.data.dir, not to use nfs bin.
You can also try to format you Hadoop cluster (if it’s new one) and start all over again.

14. The datanode died, and you cannot start the process, log reported the following errors:
2012-06-04 10:31:34,915 INFO
org.apache.hadoop.hdfs.server.common.Storage: Cannot access storage
directory / data5/hadoop_data
2012-06-04 10:31:34,915 INFO
org.apache.hadoop.hdfs.server.common.Storage: Storage directory /data5/hadoop_data does not exist.
2012-06-04 10:31:35,033 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.util.DiskChecker $ DiskErrorException: Invalid value for volsFailed: 2, Volumes tolerated: 0
This problem, I found reason the nodes of the disk becomes read-only (only read) mode, online search, I found that quite a number of this case, Linux machine's hard disk could have been set to read-write (Read / Write) mode, but occasionally found automatically becomes read-only (Read Only), check some information, this may happens for various reasons, may be the problem:
· File system errors
· Kernel hardware driver bug
· The FW firmware classes problem
· Disk bad sectors
· Hard disk backplane fault
· Hard drive cable fault
· HBA card failure
· RAID card failure
· inode resource depletion
The solution:
· Restart the server (command reboot)
· Re-mount the hard disk
· fsck try to repair
· Replace the hard disk

15. When running mapreduce task error is as follows:
2012-06-21 10:50:43,290 WARN org.mortbay.log: / mapOutput:
org.apache.hadoop.util.DiskChecker $ DiskErrorException: Could not find
taskTracker/hadoop/jobcache/job_201206191809_0004/attempt_20120619
1809_0004_m_000006_0/output/file.out . index in any of the configured
local directories
2012-06-21 10:50:45,592 WARN org.apache.hadoop.mapred.TaskTracker:
getMapOutput (attempt_201206191809_0004_m_000006_0, 0) failed:
org.apache.hadoop.util.DiskChecker $ DiskErrorException: Could not find
taskTracker / ha.doop /
jobcache/job_201206191809_0004/attempt_201206191809_0004_m_0000
06_0/output/file.out.index in any of the configured local directories
Although two warn, but also affect the operating efficiency, they still try to resolve the cause of the error is unable to find a job in the middle of the output file. Need to make the following checks:
           1. Configuration of mapred.local.dir property.
           2. Df -h to see space in the cache path adequacy.
           3. Free look at the memory space adequacy.
           4. To ensure that the cache path writable permissions.
           5. Check disk corruption.

The namenode cycle error is as follows:
2012-08-21 09:20:24,486 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit
log, edits.new files already exists in all healthy directories:
/ Data / work / hdfs / name / current / edits.new
/ Backup / current / edits.new
2012-08-21 09:20:25,357 ERROR
org.apache.hadoop.security.UserGroupInformation:
PriviledgedActionException as: hadoop cause: java.net.ConnectException:
Connection refused
2012-08-21 09:20:25,357 ERROR
org.apache.hadoop.security.UserGroupInformation:
PriviledgedActionException as: hadoop cause: java.net.ConnectException:
Connection refused
2012-08-21 09:20:25,359 WARN org.mortbay.log: / getimage:
java.io.IOException: GetImage failed. Java.net.ConnectException:
Connection refused
Related errors in secondarynamenode.
Search into an argument because:
With 1.0.2, only one checkpoint process is executed at a time. When the namenode gets an overlapping checkpointing request, it checks edit.new in its storage directories. If namenode have this file, namenode concludes the previous checkpoint process is not done yet and prints the warning message you've seen. This is the case if you ensure edits.new file before the error operation residual useless files can be deleted after the detection of whether there is such a problem.

Also make sure that the following namenode hdfs-site.xml configuration:
< Property>
< name>dfs.secondary.http.address </ name>
< value> 0.0.0.0:50090 </ value>
</ Property>
Above 0.0.0.0 modify your deployment secondarynamenode host name
secondarynamenode hdfs-site.xml the following items:
< Property>< name> dfs.http.address </ name>
< value> 0.0.0.0:50070 </ value>
</ Property>
0.0.0.0 modify the above namenode host name for your deployment

1, hadoop-root-datanode-master.log the following error:
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
Incompatible namespaceIDs in
Lead to the datanode not start.
Cause: each namenode format will be re-created is a namenodeId and directory as dfs.data.dir configuration parameters contained in the directory configured by last format created by id and dfs.name.dir parameters id inconsistent.
cleared the data of namenode under the, namenode format but not emptied the datanode under the data, resulting in startup failure have to do is each time before format, the Empty dfs.data.dir parameter configuration directory.
Formatted HDFS command
Shell code
      1. hadoop namenode-format
      2, if the datanode not connect Namenode, resulting in datanode can not start.
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
Call to ... failed on local exception: java.net.NoRouteToHostException: No route to
host
Turn off the firewall
Shell code
      1. service iptables stop
                 The machine is rebooted, the firewall will open.
      3, from the local to the HDFS file system to upload files, the following error:
            INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
             Bad connect ack with firstBadLink
         INFO hdfs.DFSClient: Abandoning block blk_-1300529705803292651_37023
         WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to
          create new block.
Solution:
Turn off the firewall:
Shell code
             1. service iptables stop
Disable SELinux:
Edit / etc / selinux / config file, set "SELINUX = disabled"
   4, safe mode errors caused by
org.apache.hadoop.dfs.SafeModeException: Cannot delete ..., Name node is in safe mode
Start in the distributed file system, the beginning of time there will be safe mode, when the case of the distributed file system in safe mode, the contents of the file system not allowed to modify can not be deleted until the end of the safe mode.
Safe Mode to check the of each DataNode data on the block effectiveness of the system starts, according to certain policies necessary to copy or delete the part of the data block. The runtime command can also be into safe mode. In practice, the system starts to modify, and delete files safe mode not allowed to modify the
error message will just have to wait a while you can.
Java code
1. hadoop dfsadmin-safemode leave
Turn off safe mode

No comments:

Post a Comment