Hadoop Featured Articles
Twitter has become a platform for massive amounts of real-time updates for events, news, trends and stories. More than 400 million tweets are sent per day, and that number grows when there is breaking news or events - the recent Boston bombings are a testament to that.
Hadoop is a software framework that can be used to manage big data. It can also be used in the cloud to keep data flowing for companies and has the ability to produce a comprehensive report that incorporates data stored in countless records.
As enterprises today utilize cloud computing, social media, the Internet of Things, location-based services and mobile devices, they are struggling with how to manage, analyze and process this little thing called big data. To harness big data, enterprises are turning to Hadoop, an open-source framework that allows for distributed processing of large data sets across clusters of computers using simple programming models.
Hadoop may be good at big data analytics, but right now it has a more pedestrian use for the majority users: storage and ETL (extract, transform, load).
A new Research and Markets report, "Global Hadoop Market 2012-2016," forecasts that the global Hadoop market might grow at a CAGR of 55.63 percent from 2012 to 2016 as the demand for big data analytics has increased.
Internet-enabled devices, social media networks, digital photos and video sharing, location-based services and e-commerce are all contributing factors to a growing trend today: big data. In order to harness, manage and analyze this data, there is Hadoop.
To enable IT professionals build world class BI applications, Actuate Corporation has integrated its Actuate One and BIRT onDemand suite of applications with Amazon Redshift, AWS's new petabyte-scale cloud warehouse service. This is the latest addition to Actuate's fast-growing stable of ActuateOne connectors for big data-ready data sources.
Earlier today, a major new development was announced for big data users from MapR Technologies. MapR announced that, starting today, MapR M7 would be available for both NoSQL and Hadoop applications, bringing the power of big data into a dependable, easy to use platform that is said to eliminate many of the trade-offs involved in a NoSQL database solution.
MapR Technologies today began distribution of a new search and analytics tool called LucidWorks Search for use with the MapR Platform for Apache Hadoop. LucidWorks Search allows customers to perform predictive analytics and advanced database operations, while offering full search and discovery capabilities.
Hadoop, the open-source platform for distributed computing, is slowly but surely conquering the big data analytics market. Will the data center be the next market Hadoop dominates?
Not so fast, VMware.
MapR Technologies, a provider of Hadoop technology, has appointed Xavier Guerin to the position of vice president for its new office serving southern Europe and Benelux. MapR has opened its new office in Paris to support the partner and customer community in that region.
For those who love the flexibility and capability of Hadoop, Hapyrus has revealed a new service it is positioning as an emerging alternative to Hadoop and Hive. FlyData will enable Hapyrus to automatically upload and migrate data to the data-warehouse service, Amazon Redshift. Users are starting to flock to this service that has the ability to scale to petabyte size.
OpenStack is considered as the largest and most active open-source cloud computing project and has attracted a lot of support from vendors like Rackspace, Dell, HP, Cisco and VMware.
WANdisco, a provider of high-availability software for global enterprises to meet the challenges of big data and distributed software development, released version 3.1.1 of WANdisco Distro (WDD), the first 100 percent open-source, commercially supported, fully tested version of Apache Hadoop 2. The new version includes an enhanced, more intuitive user interface that simplifies Hadoop cluster deployment.
Getting MapR into a vSphere environment has, in the past, sometimes been found to be difficult for businesses to get up and running. VMware and MapR Technologies have been spotted working together in the past to help businesses make an easier go of it--almost like having one of those "Easy" buttons Staples popularized--and have recently brought out the Serengeti M4 to bring out not only an "Easy" button of sorts, but also a "Free" button.
Hadoop is considered nothing short of a phenomenon, but could it get even bigger? Canonical and MapR Technologies, a top three Hadoop provider, seem to think so. The two companies have partnered to further drive the success of this platform.
Apache Hadoop is an open-source software framework that offers support to distributed applications. With the help of this framework, developers can run applications on huge clusters of commodity hardware.
Platfora's newly released native in-memory business intelligence (BI) platform for Hadoop is now generally available for sale. With the new platform, the company maintains its promise to deliver business value on Hadoop.
It's more than a funny name. Hadoop, the open-source software framework for data-intensive distributed applications is proving useful over at the professional social networking site LinkedIn.
Apache Hadoop, a community driven open-source software framework that supports data-intensive distributed applications, will be expanding to Windows thanks to the launch of the beta version of Hortonworks Data Platform (HDP).
AMD's SeaMicro SM15000 Server Now Certified for Cloudera's Distribution Including Apache Hadoop Version 4
Advanced Micro Devices (AMD), a semiconductor design innovator, recently announced that its SeaMicro SM15000 server is now certified for CDH4, Cloudera's Distribution Including Apache Hadoop Version 4.
Today, the MapR Big Data Platform is being used in production deployments across financial services, government, healthcare, manufacturing, retail and Web 2.0 companies to drive significant business results and includes the analysis of hundreds of billions of objects a day, 90 percent of the Internet population monthly and more than a trillion dollars of retail transactions annually.
MapR and Twitter recently demonstrated real-time Hadoop analytics at the Strata Conference at the end of February. By harnessing the power of the Twitter API, the two companies streamed the #strataconf hashtag directly into a cluster.
For a lot of people who deal with Hadoop, one of the most complex and frustrating jobs can be running MapReduce on ingested data. The task is usually done in batches, with data needing to be transferred to local files that the Hadoop cluster can get at, then the whole thing needs to be copied to HDFS, either with hadoop.fs or with Flume. After all that transferring, then, finally, can MapReduce be run at all. But now, thanks to MapR's new Direct Access NFS feature, the whole process is about to get a lot simpler, requiring a lot less transferring.
MapR Technologies, a provider of Hadoop distribution, will be presenting its insights related to big data and Hadoop at three forthcoming conferences. The organization will also exhibit its prize-winning MapR Distribution at these conferences. MapR Technologies will be attending Gartner Business Intelligence & Analytics Summit, GigaOm Structure:Data and SVForum: Big Data and Analytics Conference Between Now and 2020, all scheduled in March.
En Pointe Becomes First VAR to Sell Intel Distribution for Apache Hadoop within Public, Private Cloud
En Pointe Technologies recently announced that through its partnership with Intel, it has become the first value-added reseller (VAR) to offer hardware, software and services support to enterprise customers, enabling them to optimize their Apache Hadoop deployment. En Pointe has been continuously investing in cloud, collaboration and new technologies which have helped it consolidate its foundation within the big data space.
Let's not kid ourselves; Hadoop is doing well. Some would say very well; a recent eWeek article declared that 2013 would be the year that Hadoop beat out the big data analytics competition.
In just three years, the HP Converged Infrastructure portfolio has outpaced the competition with substantial benefits delivered to business and government customers around the world.
One terabyte is 1,000 gigabytes, or the equivalent of 6.5 Netbooks, the storage space on one standard Dell Inspiron desktop computer, 250,000 images that are 4 megs in size each, 128 8-Gig movie DVDs or 20 Blu-ray discs. MapR Technologies sorted more than that in less than one minute.
The last three years have been significant for the Java-based programming framework, Hadoop, and not because it has an unusual name. When it comes to big data, IT professionals agree: Hadoop is set up to dominate with more capabilities to come in 2013.
Supermicro, a provider of application-optimized server, workstation, blade, storage and GPU systems, has launched big data solutions with support for the new Intel Distribution for Apache Hadoop Software. To meet these requirements, Supermicro's Hadoop-optimized server and storage systems have undergone rigorous testing and validation.
When we tweak photos, we "Photoshop" them. Soon, when we crunch big data we might consider impressing our friends by "Hadooping" the data. At least that's the direction advanced analytics products are going, with Gartner predicting that two-thirds of such products will have Apache Hadoop embedded within them by 2015.
MapR Technologies, a specialist in Hadoop, plans to highlight the impact of Hadoop on a company's competitive advantage in a series of sessions. Hadoop was created to analyze different types of big data covering data files that are structured or unstructured, logs, pictures, audio files, communications records and e-mail, which are transferred over networks every day.
Today, we are dealing with all different kinds of data files: structured, unstructured, logs, pictures, audio files, communications records, e-mail and more are all dealt with and transferred over networks every day. Hadoop was created to analyze different types of big data, however, it is complex to deploy, configure, manage and monitor. Before deploying a Hadoop cluster, things to consider including operating systems, computation, memory, storage, network, switches and data movement in and out of the cluster.
HStreaming, a provider of solutions for realizing continuous real-time analytics to big data, recently raised an undisclosed amount of venture funding by Atlas Venture, an early stage venture capital firm.
Big data specialist WANdisco has announced the availability of WANdisco Distro (WDD), the world's first production-ready Apache Hadoop 2 distribution for big data, for free download.
The Apache Drill project launched back in August 2012, and with six months between the launch and the present day, some are already taking a look back at the project to see if it's proving its worth in the field. With advances being made on several fronts, the idea of use-cases for the Apache Drill are beginning to come into light, and the total value of the system is starting to make itself known.
There's no denying that technology, especially these days, is moving fast. New developments emerge on a regular basis and developments that were formerly new find their metaphorical chrome peeling away to reveal either flashes in the pan or the solid underpinnings of stable technologies. When the Gartner Hype Cycle--a process by which many technologies can be tracked from new and shiny to mature and stable and even to obsolete if one goes out far enough--was applied to big data, the unexpected came back: big data is further along than many predicted.
Enterprises today collect and generate more data than ever before. Hadoop is an open-source framework for running applications on large clusters of commodity hardware, designed to solve the scalable, reliable storage and analysis of both structure and complex data. Ted Dunning, architect at MapR Technologies, recently presented "The Power of Hadoop to Transform Business," talking about the future of Hadoop. Integration of Hadoop with traditional IT makes a big difference in the way companies can use scalable computing.
Hadoop technologies specialist MapR recently released a list of its "customer wins" for 2012, which provides further insight into the spike in demand the company experienced for its analytics software over the last year, particularly from large consumer-facing organizations.
Dataguise has recently formed a partnership with MapR Technologies to manage complex data analysis workloads and prevent unauthorized access to data on Hadoop platform. Under this partnership, DG for Hadoop is now certified for use with the MapR Distribution for Apache Hadoop. This certification also assures data privacy protection and delivers risk assessment intelligence for enterprises using the MapR Hadoop distribution. This is win- win deal for both companies.
Dataguise has partnered with MapR Technologies to create a way to manage complex data analysis workloads and prevent unauthorized access to data on the Hadoop platform. Under this partnership, DG for Hadoop is now certified for use with the MapR Distribution for Apache Hadoop. This certification also assures data privacy protection and delivers risk assessment intelligence for enterprises using the MapR Hadoop distribution, a win- win deal for both companies.
MapR Technologies, a specialist in Hadoop technologies, has achieved significant customer wins across a broad section of industries.
The Cloudera Connect Partner Program from Cloudera, a company specializing in Hadoop and big data, focuses on accelerating the innovative use of Apache Hadoop for a range of business applications. The program is gaining wider support from companies focusing on big data applications.
The growing volume of data continues to drive the need for capable and cost-effective analytical tools. As a result, driven by technology advances, Hadoop gained significant traction in the marketplace last year. Suitable for a more broad range of organizations and use cases, Hadoop is predicted to establish its dominance in big data analytics with the addition of even more capabilities.
Statisticians who are familiar with the R programming language now are better able to use Hadoop to run MapReduce jobs or access HBase tables. Revolution Analytics has created RHadoop, a collection of three R packages that let users run MapReduce jobs entirely from within R as well as giving them access to their Hadoop files and HBase tables, according to a recent MapR Technologies blog post.
MapR Technologies, the open, enterprise-grade distribution for Hadoop, recently launched its European operation to meet the needs of its growing community of customers and partners across the region. The new European headquarters in London, England, will provide MapR with a base for technical and sales resources to accelerate the adoption of its high performance, enterprise-grade distribution for Hadoop.
Create Virtual Hadoop Cluster Environments in Less Than 10 Minutes with Skytap Cloudera Hadoop for Enterprise Hybrid Clouds
"Big data" is a collection of information that is so large you can't use normal means to process it. With six billion mobile subscriptions worldwide, more than one billion Facebook users and 400 million tweets per day, the volume of digital content is expected to reach the equivalent of 18 Libraries of Congress by 2015. So, a solution called Hadoop was born. Hadoop is an open-source way of storing and processing data. It can handle all types of data from disparate systems, such as structured, unstructured, log files, pictures, audio files, communications records and e-mail.
Guident Technologies has come under the umbrella of CRGT, a provider of full life-cycle IT services and emerging technology solutions for the Federal Government and a Veritas Capital portfolio company. This acquisition proves CRGT's expansion into key technology growth markets, bringing high-end capabilities in the arenas of big data analytics and business intelligence solutions to CRGT's existing portfolio of IT service offerings.
Is Hadoop fundamentally changing the data warehousing equation?
Big data has quickly turned into the biggest thing to hit information technology since the virtualization craze of the last decade. According to research firm Wikibon, the big data market is on the verge of a rapid growth spurt that will hit $50 billion worldwide within the next five years. The rate at which the importance and popularity of big data has grown can be directly attributed to open source. Most of the new big data frameworks and databases have their roots in the open source world, where developers routinely create new approaches to problems that haven't yet hit mainstream.
With today being the last day of the year and all, it's not hard to look forward at the upcoming year and wonder just what will happen. In turn, it's no surprise that the various big data concerns out there also took a look and gave some of their predictions about what was likely to happen in this space.
Dealing with big data is critical for large enterprises that depend on their networks for business continuity. Big data analytics applications like Hadoop enable organizations to investigate, troubleshoot and diagnose network, security and application related problems. A new project called the H(app)athon Project is helping organizations deal with the challenges associated with handling big data.
When running a business, it's key to ensure you have the latest innovations that help you get your product to market, meet the needs of your customer base and still turn a profit. Such innovations vary according to the company and the industry, but the needs exist all the same. But, what if you could get exactly what you need and it didn't cost you any money - what would that mean for your bottom line?
Documents, e-mails, contracts, media and graphics - these are just a sample of the electronic content that businesses around the world generate each day. And while they're not filling file cabinets and taking up valuable square feet of real estate, they are consuming valuable storage resources.
Oracle has unveiled its new Big Data Appliance X3-2 - which may prove attractive to many businesses and other organizations looking to upgrade their technology in the big data age. The Oracle Big Data Appliance X3-2 includes hardware and software which features Intel's new processors, Apache Hadoop (CDH), Cloudera Manager and the new Oracle Enterprise Manager plug-in for Big Data Appliance.
Hadoop is an open-source Apache implementation project. The project aims to deal with the big data challenges by facilitating the storage and processing of big data.
By deploying Infochimps Enterprise Cloud, one can easily bring down the risk, time-to-value and complexity involved in enterprise big data projects.
A shift has occurred in the way companies collect data, compared to the way it has traditionally been done. While before, the only data that was needed to answer a specific question was gathered, companies now cast a wide net in an effort to gain a better understanding of their customers, while striving to gather more customers. These massive stores of data are called big data and have led to an industry dedicated to making sense of this data.
Today the term big data has come into use recently to refer to the increasing amount of information that organizations are storing, processing and analyzing. To effectively use the vast amounts of data, ExtraHop Networks, a provider of network-based application performance management (APM) solutions, has launched its SAP Sybase IQ Module, designed to give IT organizations operational intelligence into big data analytics and data warehousing environments.
With the proliferation of mobile and Internet-connected devices, the amount and size of data in industries today is growing, fast. Hadoop is an open-source framework designed to process and store big data. It makes data mining, analytics and processing of big data cheap and fast.
Luminar, a data analytics and modeling provider focused specifically on connecting marketers with U.S. Latino consumers, is using the Hortonworks Data Platform (HDP) to deploy fully integrated big data architecture. The announcement about this agreement has been made by Hortonworks, a contributor to Apache Hadoop.
Hadoop was recently described as a "three-headed open core" run by Cloudera, Hortonworks and MapR Technologies. Which begs the question: Which head should you choose when planning to leverage Hadoop?
According to research firm IDC, the total amount of digital data, or big data, will reach 2.7 zettabytes by the end of this year. Approximately 90 percent of this data will be unstructured. By transforming this unstructured data into business insights, the businesses can gain important competitive advantages. LucidWorks, a developer of search, discovery and analytics software based on Apache Lucene and Apache Solr technology, and MapR Technologies, a developer of Apache Hadoop-derived software, have teamed up to jointly host a webinar that will highlight the ways big data can be tapped and leveraged for creating business value.
Big data is playing a growing and critical role in day-to-day business operations, helping companies compete more effectively and become more efficient. However, difficulties in capturing this data and delivering it to front-line business systems have slowed down what companies can do with the data at their fingertips. Apache Hadoop was born out of necessity as data from the Web exploded, and grew far beyond the ability of traditional systems to handle it.
Around 80 percent of data from big data is unstructured. With this massive quantity of unstructured data, businesses need faster, more reliable and deeper data insights. Therefore, big data solutions based on Hadoop and other analytics software are becoming more and more relevant. One of Hadoop's strengths is that it can process and analyze huge amounts of unstructured data - video, audio, social media postings, images, etc. - in ways that were previously impossible.
Mortar Data is a startup that only just publicly launched in late November. It is also an open-source development framework for Hadoop that is built specifically for collaboration, allowing for easy sharing, repeating and maintaining of code. Mortar was initially only offered as a hosted Hadoop service, but the company has now released the open-source Mortar framework for Hadoop applications.
Today, the Web is exploding with a tsunami of data inundating organizations like never before, and traditional systems are shying away from storing, let alone analyzing, the thousands of petabytes of data. The birth of Apache Hadoop has provided them with a fundamentally new way for efficient handling of exponential data.
London-based big data consultancy firm, Big Data Partnership, is one of the few organizations that Microsoft has handpicked to participate in its Big Data Partner Incubation Program. As part of this program, Big Data Partnership will collaborate with Microsoft to offer Microsoft HDInsight, an Apache Hadoop-based solution for the Windows Azure and Windows Server platforms that make managing big data easier for the enterprises.
Big data means big opportunity for increasingly large numbers of firms. But big data just doesn't make big opportunity in isolation; it requires care and vigilance to go from just a flood of information to a powerful source of opportunity. There are five critical points to make big data pay off big time, and each one is as important as the other.
Internet-connected devices, mobility trends, in-memory data processing and the rapid evolution of software tools are all contributors to this trend we call big data. IDC, a research firm, predicts that the market for big data technology and services will reach $16.9 billion by 2015, up from $3.2 billion in 2010. A LogLogic survey found that 49 percent of organizations are somewhat or very concerned about managing big data, but 38 percent don't understand what big data is, and a further 27 percent say they have a partial understanding. So while big data is among the top buzzwords in the tech industry, it still has a long way to go.
Managing unstructured data is becoming a tough challenge for enterprises. Apache Hadoop is an open-source software project designed to address this challenge. Hadoop enables the distributed processing of large data sets across clusters of commodity servers.
Colfax recently unveiled the joint demonstration of a high-performance Hadoop appliance over Mellanox's end-to-end FDR 56Gb/s InfiniBand solution. Now, the Colfax Hadoop appliance is ready to serve the big data market.
Is Hadoop like Facebook or Friendster?
Archimedes is more than a philosopher or Merlin's owl - it's also the name of a healthcare company in San Francisco. And Archimedes Inc. needed to do some big data crunching, so for that, it turned to Univa. With help from Univa, Archimedes can now predict the flow of disease through a person's body, determining how it can affect a population and helping to prevent it from spreading.
Continuuity, a start-up which aims to build something along the lines of a Platform as a Service for Hadoop, recently raised $10 million in a Series A funding round, led by Battery Ventures and Ignition Partners. Other participants in the round included Andreessen Horowitz, Data Collective and Amplify Partners.
While there are plenty out there who think of Sears when they think of dress slacks or hardware, they may not think of data warehousing. But behind the scenes of Sears, it may well be onto something, specifically in the way it's using Hadoop in its systems.
Hadoop technology company MapR Technologies recently partnered with Hadapt, the only data analytics platform integrating SQL with Apache Hadoop. The pairing of these two companies allows customers to leverage MapR Distribution for Hadoop alongside Hadapt's Interactive Query capabilities, which allow for analysis of all types of data - structured, semi-structured and unstructured - in a single platform.
At the recent Hadoop workshop in New York City, MapR Technologies announced that it had set a new 1TB TeraSort benchmark record of 54 seconds using the Google Compute Engine. This not only crushed the previous record of 62 seconds held by Yahoo, but did so with less physical hardware.
The amount of data in our world has been exploding, and analyzing large data sets and big data will become a key basis of competition, underpinning new waves of productivity growth, innovation and consumer surplus, according to research by MGI and McKinsey's Business Technology Office. Making the most of big data means quickly analyzing a high volume of data generated in many different formats.
Few companies have the raw data that Facebook gains from its dominance in the social networking space. Every 24 hours, more than half a petabyte of new data shows up on the Facebook servers, according to a recent Facebook blog post.
Platfora, a U.S.-based software company which develops business intelligence and analytics platforms based on Apache's open source Hadoop framework, recently raised $20 million in a Series B funding round led by Battery Ventures, along with Andreessen Horowitz and Sutter Hill Ventures. This brings Platfora's total funding to $25.7 million in just over a year.
Data becomes big data when your current ability cannot process it, store and cope with it efficiently. Apache Hadoop is an open source framework and is the best tool available today for processing and storing herculean amounts of big data. It makes data mining, analytics and processing of big data cheap and fast.
There are those out there who say that massively parallel processing (MPP) appliances aren't really related to the big data concept. Others, meanwhile, say that MPP appliances are indeed true big data appliances. Those who say that MPP and big data go hand in hand, however, got a little extra proof for their side of the issue thanks to Microsoft's recent announcement of PolyBase at the PASS Summit last week.
Smartphones, tablets, MP3 players, SMS messaging, YouTube, Facebook, online banking and Wi-Fi all are examples of day-to-day technologies that people use frequently and help create data. Thanks to mobile devices, social networks, Internet searches, e-commerce, video archives and other advancements in technology, the big data phenomenon came about thanks to the amount of data being generated today by these technologies. Big data refers to the collection of data sets so large and complex, it's impossible to process them with the usual databases and tools.
Featured White Papers
What will make Hadoop and enterprise data center-grade analytics platform?
High Availability in the Hadoop Ecosystem: MapR provides high availability with no single points of failure across the entire stack.
The MapR Distribution for Apache™ Hadoop® provides high availability with no single points of failure across the entire stack. In the storage layer, MapR's Distributed NameNode HA™ architecture provides high availability with self-healing and support for multiple, simultaneous failures, with no additional hardware whatsoever.
MapR M7 Edition is a complete distribution for Apache Hadoop and HBase™ that includes Pig, Hive, Mahout, Cascading, Sqoop, Flume and more. The M7 Edition makes HBase™ easy, dependable and fast. M7 not only delivers enterprise grade features such as Instant Recovery, Snapshots and Mirroring but also provides consistent performance while eliminating architectural complexity.
Subscription software offering that includes features such as mirroring, snapshots, NFS HA, data placement control, and many more. The M5 Edition also offers full support, on-demand patches and online incident submission.