Big Data Open Source Landscape: Processing Technologies

July 15th, 2014|

Hadoop is a well established software framework which analyse structured/unstructured big data and distribute applications on thousands of servers. Hadoop was created in 2005 and after Hadoop several projects around in the Hadoop space appeared that tried to complement it. Sometimes those technologies overlap with each other and sometimes they are partially complementary. I will try to describe a brief map of them.   Programming Model The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. Apache Hadoop Project brings an open source MapReduce Implementation.   Management layer The scalability that is needed for big data processing is supported [...]