The Advantages of MapR Technologies' Hadoop Distribution
December 10, 2012
By
Mae Kowalke, TMCnet Contributor
Hadoop was recently described as a “three-headed open core” run by Cloudera, Hortonworks and MapR Technologies. Which begs the question: Which head should you choose when planning to leverage Hadoop?
“The only decision customers have to make is what are their highest priorities in approaching a big data project,” wrote a MapR blog post.
Whatever those priorities turn out to be, most likely MapR will be a strong possibility.
MapR Technologies has taken Hadoop and focused on making it optimized for the enterprise. This includes making development, administration, high availability and data protection better than what comes out of the box.
“MapR has chosen to bring what we believe is the most enterprise-ready version of Hadoop to market. This has taken significant work and investment in adding value to Hadoop where it matters most,” MapR wrote on the blog.
Let’s start with ease of development. The MapR distribution of Hadoop comes with a unique Storage Services layer that includes Direct Access (News - Alert) NFS. Commonly, a Hadoop user needs to write log files to an intermediate location, load the data to Hadoop with a batch program, then close the file and perform batch analysis. But with MapR, the log files can be streamed directly to the cluster and analyzed at any point without having to perform a close.
Direct Access NFS also dramatically reduces costs by making it easy to get data in and out by performing operations on the same disk and letting users drag and drop.
MapR makes administration easier, too. MapR has built many functions into the distribution to make data analysis completely automated and not subject to manual feeding, and it includes alerts, alarms and insights through MapR’s Heatmap that shows cluster health and performance.
Helping with administration, MapR Volumes also simplifies data security, retention, placement and quota management, and the MapR Control System provides visual insight into node health, service status, resource utilization organized by cluster topology among other administrative features.
While MapR improves upon Hadoop development and administration, it also gives Hadoop more high availability. The MapR distribution comes with advanced HA features such as its “No NameNode” architecture. This provides automated stateful failover capability that protects against data loss or downtime even in the face of multiple disk or node failures without any manual intervention.
Finally, the MapR distribution comes with both mirroring and snapshots, which ensures that Hadoop is always protected against not only hardware failure, but also user and application errors that can be otherwise replicated across clusters. Moreover, there is zero performance loss on writing to the original during a snapshot.
So when choosing the right Hadoop distribution, MapR should definitely be a consideration.
Want to learn more about the latest in communications and technology? Then be sure to attend ITEXPO Miami 2013, Jan 29- Feb. 1 in Miami, Florida. Stay in touch with everything happening at ITEXPO (News - Alert). Follow us on Twitter.
Edited by
Rachel Ramsey