For almost 10 years I have been experimenting any new technology that has anything to do with data. from structured, semi-structured to unstructured data my real frustration was always when the data grew big and become almost in-exploitable. there is nothing embarrassing to a data scientist than when you have to admit that you do have the data but it will take years to get information from it.
Thanks to a conversation I had with a prominent IT manager in Australia the word Hadoop rung in my ear. I couldn’t believe how enthusiast the manager was when he talked about Hadoop and how it was helping them to manage, distribute and analyse big data. I remember him saying Hadoop change everything we ever known about file systems. As a person always eager to get to a next level of data management I decided to learn Hadoop and the ecosystem of tools surrounding it. What I found was just unbelievable. Hadoop is a must for all software engineers who are lovers of data.
Hadoop is an inexpensive open source solution for big data. its famous HDFS (Hadoop Distributed File System) brings a new concept on how data is stored in a file system of a disk and hence affect as well the disk I/O. with the MapReduce algorithm, Hadoop provide an effective way of dividing data into multiple small chunks of data that in turn are distributed to thousands of nodes (Let’s say each node correspond to a virtual machine) of a cluster for rapid analysis. the results are passed to the reducers which combine the results from the thousands of mappers into a single response to be printed either on HDFS or on a local File system.
New Tools like Spark, HiveQL, Pig, etc… have taken hadoop to a next level. my guess is that hadoop is moving from an offline data analytic tool to an online data analytic tool.
I have shared a link to a paper published by TDWI Research. the paper is entitledEight considerations for utilising big data analytics with Hadoop. if you are a manager and interested to know what Hadoop can bring to your company this paper is right for you.