Hadoop May Not Be Perfect, but MapR's Distribution Aims to Be
December 17, 2012
By Rory Lidstone
, TMCnet Contributing Writer
A shift has occurred in the way companies collect data, compared to the way it has traditionally been done. While before, the only data that was needed to answer a specific question was gathered, companies now cast a wide net in an effort to gain a better understanding of their customers, while striving to gather more customers. These massive stores of data are called big data and have led to an industry dedicated to making sense of this data.
Hadoop is a software framework that sprung up to address the need for sorting big data. Open-source software based on Apache and derived from Google's (News - Alert) MapReduce, Hadoop helps organizations store and process big data. Hadoop became the top choice for sorting big data in part because it was designed to run on a cluster of computers, which allows it to use commodity hardware and distribute work across machines to achieve a high degree of scalability.
Of course, Hadoop does have its weaknesses — particularly in terms of adopting Hadoop within a company's current infrastructure. Furthermore, most implementations of the Hadoop programming model don't provide a distributed resource management system fast or scalable enough to take full advantage of Hadoop's theoretical speed, while support for only a single distributed file system, most commonly HDFS, slows down the processing of big data even further.
Fortunately, companies have stepped up to provide their own Hadoop distributions, largely with the aim of improving upon the framework's issues. MapR Technologies is one such company.
MapR's Hadoop distribution boasts the capability to run faster with half the hardware compared to other distributions, with higher random I/O of five to 100 times, automatic and transparent compression, and the ability to scale linearly with a number of cores and nodes.
Aside from speed, the MapR distribution also offers greater ease of use, easy graphical provisioning and planning tools, 100 percent API compatibility with Hadoop, usage tracking and quotas, and dependability. JobTracker HA prevents lost jobs and NameNode HA eliminates downtime and data loss, offering support for unlimited number of files.
MapR Distribution for Hadoop has indeed found success among companies such as Amazon, which added the distribution as an option within its Elastic MapReduce service earlier this year.
Want to learn more about the latest in communications and technology? Then be sure to attend ITEXPO Miami 2013, Jan 29- Feb. 1 in Miami, Florida. Stay in touch with everything happening at ITEXPO (News - Alert). Follow us on Twitter.
Edited by Rachel Ramsey