Posts

Open Source Overview

Image
Open Source Overview I have worked with, written about and reviewed over a hundred open source projects over the last five years either in systems built, books written or via blogs, papers and presentations. The question that always comes to my mind is how to build systems from all of these varied products and how to have an awareness of them that is wide enough to encompass their ever changing mass, range and grouping of versions. If I want to build an open source big data system I might stick to the tried and trusted components such as Hadoop, HDFS, Spark, Oozie, Hive, Sqoop etc. But with an ever changing variety what else is available now and might be available in the near future ? How can I determine which versions of products will safely work together ? I am well aware of integration projects like Apache BigTop and stack based releases like MapR, Cloudera and what was HortonWorks. But I am trying to think beyond their offerings into an ever changing landscape.