Open source and open standards

When Cloudera’s chief architect Doug Cutting founded the Apache Hadoop project, it was with an open source vision firmly in mind. Throughout its history, Cloudera has been strongly committed to a community-driven, Hadoop-based platform based on open standards that meets the highest enterprise expectations for stability and reliability.

Learn about Apache Hadoop

Innovating in open source

Some vendors consume the open source community’s activity; others help drive it. Cloudera leads in influencing Hadoop platform evolution by creating, contributing, and supporting new capabilities that meet customer requirements for security, scale, and usability.

Learn about Apache Kudu

Curation of open standards

Cloudera has a long and proven track record of identifying, curating, and supporting open standards (including Apache HBase, Apache Spark, and Apache Kafka) that provide the mainstream, long-term architecture upon which new customer use cases are built.

Read about our commitment to open source

Highest enterprise requirements

To ensure the best customer experience, Cloudera invests significant resources in multi-dimensional testing on real workloads before releases, as well as in supportability of the entire platform via extensive involvement in the open source community.

Open source big data: An ecosystem of projects

Apache Hadoop is an open source software platform for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. Hadoop services provide for data storage, data processing, data access, data governance, security, and operations.

Data processing

Apache Accumulo

A sorted, distributed key-value store with cell-based access control.

Apache Ambari

A completely open source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters.

Apache Crunch

A Java library provides a framework for writing, testing, and running MapReduce pipelines

Apache Flume

A service for streaming logs into Hadoop

Apache NiFi

A real-time integrated data logistics and simple event processing platform

Security, governance, and metadata

Apache Atlas

Agile enterprise compliance through metadata

Apache Knox Gateway

Secure entry point for Hadoop clusters

Apache Ranger

Comprehensive security for Enterprise Hadoop

Apache Sentry

Provides fine-grained authorization and role-based access control

Data warehouse

Apache HBase

A non-relational (NoSQL) database that runs on top of HDFS

Apache Hive

The de facto standard for SQL queries in Hadoop

Apache Kudu

Storage for fast analytics on fast data

Apache Phoenix

An open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase

Apache Sqoop

Efficiently transfers bulk data between Apache Hadoop and structured datastores

Docker

Securely build, share and run any application, anywhere

Data engineering

Apache Druid

An open-source analytics data store designed for business intelligence (OLAP) queries on event data.

Apache Flume

A service for streaming logs into Hadoop

Apache Kafka

A fast, scalable, fault-tolerant messaging system

Apache Oozie

The blueprint for Enterprise Hadoop includes Apache Hadoop’s original data storage and data processing layers

Apache Pig

A scripting platform for processing and analyzing large data sets

Apache Slider

A Framework for YARN-based, Long-running Applications In Hadoop

Apache Solr

Rapid indexing & search on Hadoop

Apache Spark

Spark adds in-Memory Compute for ETL, Machine Learning and Data Science Workloads to Hadoop

Apache Storm

A system for processing streaming data in real time

Apache Tez

A Framework for YARN-based, Data Processing Applications In Hadoop

Apache Hadoop YARN

The Architectural Center of Enterprise Hadoop

Apache ZooKeeper

An open source server that reliably coordinates distributed processes

Operations

Apache Arrow

A cross-language development platform for in-memory data

Apache Impala

The open source, analytic MPP database for Apache Hadoop that provides the fastest time-to-insight.

Apache Mahout

For Creating Scalable Performant Machine Learning Applications

Apache Whirr

A set of libraries for running cloud services

Apache Zeppelin

A completely open web-based notebook that enables interactive data analytics

Hue

An open source SQL Workbench for Data Warehouses

TensorFlow

An end-to-end open source machine learning platform

Cloudera's open source credentials

Cloudera is the first and original source of a supported, 100% open source Hadoop distribution (CDH)—which has been downloaded more than all others combined.
Cloudera has contributed more code and features to the Hadoop ecosystem, not just the core, and shipped more of them, than any competitor.
Cloudera employs the most contributors and committers across open standards in the ecosystem, not just the core.
Cloudera employees have founded more (20+) successful Hadoop ecosystem projects than any competitor, including Apache Hadoop itself.
For components that are supported by multiple vendors (i.e., standards), more than half of all Apache JIRAs that are assigned to platform vendor employees are closed/resolved by Cloudera employees.
Cloudera engineers currently occupy more than 100 Apache committer seats (and more than 80 project management committee seats) across all projects that we support.

Read the Data Sheet

Man on steps with laptop

The best support for customer success

Cloudera’s global support team of project committers across the Hadoop ecosystem represents the largest, most experienced engineering resource dedicated full-time to customer success.

Get the support you need, when you need it

Get started now

Explore professional services

Find documentation

Your form submission has failed.

This may have been caused by one of the following:

Your request timed out
A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.