Innovating in open source

Some vendors consume the open source community’s activity; others help drive it. Cloudera leads in influencing Hadoop platform evolution by creating, contributing, and supporting new capabilities that meet customer requirements for security, scale, and usability.

Learn about Apache Kudu

Curation of open standards

Cloudera has a long and proven track record of identifying, curating, and supporting open standards (including Apache HBase, Apache Spark, and Apache Kafka) that provide the mainstream, long-term architecture upon which new customer use cases are built.

Read about our commitment to open source

 

Highest enterprise requirements

To ensure the best customer experience, Cloudera invests significant resources in multi-dimensional testing on real workloads before releases, as well as in supportability of the entire platform via extensive involvement in the open source community.

 

Open source big data: An ecosystem of projects

Apache Hadoop is an open source software platform for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.  Hadoop services provide for data storage, data processing, data access, data governance, security, and operations.

Data processing

Apache Accumulo

A sorted, distributed key-value store with cell-based access control.

Learn more

Apache Ambari 

A completely open source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters. 

Learn more

Apache Crunch

A Java library provides a framework for writing, testing, and running MapReduce pipelines

Learn more

Apache Flume

A service for streaming logs into Hadoop

Learn more

Apache NiFi

A real-time integrated data logistics and simple event processing platform

Learn more

Security, governance, and metadata

Apache Atlas

Agile enterprise compliance through metadata

Learn more

Apache Knox Gateway 

Secure entry point for Hadoop clusters

Learn more

Apache Ranger

Comprehensive security for Enterprise Hadoop

Learn more

Apache Sentry

Provides fine-grained authorization and role-based access control

Learn more

Data warehouse

Apache HBase

A non-relational (NoSQL) database that runs on top of HDFS

Learn more

Apache Hive

The de facto standard for SQL queries in Hadoop

Learn more

Apache Kudu

Storage for fast analytics on fast data

Learn more

Apache Phoenix

An open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase 

Learn more

Apache Sqoop

Efficiently transfers bulk data between Apache Hadoop and structured datastores

Learn more

Docker 

Securely build, share and run any application, anywhere

Learn more

Data engineering

Apache Druid

An open-source analytics data store designed for business intelligence (OLAP) queries on event data.

Learn more

Apache Flume

A service for streaming logs into Hadoop

Learn more

Apache Kafka 

A fast, scalable, fault-tolerant messaging system

Learn more

Apache Oozie

The blueprint for Enterprise Hadoop includes Apache Hadoop’s original data storage and data processing layers

Learn more

Apache Pig

A scripting platform for processing and analyzing large data sets

Learn more

Apache Slider

A Framework for YARN-based, Long-running Applications In Hadoop

Learn more

Apache Solr

Rapid indexing & search on Hadoop

Learn more

Apache Spark

Spark adds in-Memory Compute for ETL, Machine Learning and Data Science Workloads to Hadoop

Learn more

Apache Storm 

A system for processing streaming data in real time

Learn more

Apache Tez 

A Framework for YARN-based, Data Processing Applications In Hadoop

Learn more

Apache Hadoop YARN

The Architectural Center of Enterprise Hadoop

Learn more

Apache ZooKeeper 

An open source server that reliably coordinates distributed processes

Learn more

Operations

Apache Arrow

A cross-language development platform for in-memory data

Learn more

Apache Impala

The open source, analytic MPP database for Apache Hadoop that provides the fastest time-to-insight.

Learn more

Apache Mahout

For Creating Scalable Performant Machine Learning Applications

Learn more

Apache Whirr

A set of libraries for running cloud services

Learn more

Apache Zeppelin

A completely open web-based notebook that enables interactive data analytics

Learn more

Hue

An open source SQL Workbench for Data Warehouses

Learn more

TensorFlow

An end-to-end open source machine learning platform

Learn more

Cloudera's open source credentials

  • Cloudera is the first and original source of a supported, 100% open source Hadoop distribution (CDH)—which has been downloaded more than all others combined.
  • Cloudera has contributed more code and features to the Hadoop ecosystem, not just the core, and shipped more of them, than any competitor.
  • Cloudera employs the most contributors and committers across open standards in the ecosystem, not just the core.
  • Cloudera employees have founded more (20+) successful Hadoop ecosystem projects than any competitor, including Apache Hadoop itself.
  • For components that are supported by multiple vendors (i.e., standards), more than half of all Apache JIRAs that are assigned to platform vendor employees are closed/resolved by Cloudera employees.
  • Cloudera engineers currently occupy more than 100 Apache committer seats (and more than 80 project management committee seats) across all projects that we support.
Man on steps with laptop

The best support for customer success

Cloudera’s global support team of project committers across the Hadoop ecosystem represents the largest, most experienced engineering resource dedicated full-time to customer success.

Get the support you need, when you need it

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.