Installing hadoop
The first step is installing Hadoop (I found useful the following tutorials):
https://hadoop.apache.org/docs/r1.2.1/single_node_setup.html
https://hadoop.apache.org/docs/r1.2.1/cluster_setup.html
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
HDFS
Then we create a HDFS system with two folders, input and output, and we upload the files to process into the input folder.
someone@anynode:hadoop$ bin/hadoop dfs -ls
Found 2 items
drwxr-xr-x - username supergroup 0 2013-12-22 12:51 /user/username/input
drwxr-xr-x - username supergroup 0 2013-12-22 12:50 /user/username/output
More info: http://developer.yahoo.com/hadoop/tutorial/module2.html
The input to our program, after preprocessing, are a file:
132.156.180.68 22 146.25.6.199 33980 1070631061.367672000
We start with an easy program to know the number of packets each server receives and sends.
We develop a custom key class to pass from the map to the reduce. This class has to implement WritableComparable. (cf. [1, Example 4.7] or http://developer.yahoo.com/hadoop/tutorial/module5.html)
Code coming pretty soon.
Reference guide for me:
[1] Tom White, "Hadoop: The Definitive Guide", O'Reilly
No comments :
Post a Comment