Hadoop and CDP Migration Paths: Decision Guide

Bifrost supports three migration paths. Each path has a distinct source, target, strategy, and timeline. Paths can be executed independently for a specific outcome, or chained (a Classic replatform followed by a Modernize engagement) for a progressive transition.

The Three Paths

Path	CLI namespace	Source	Target	Strategy	Typical timeline
Classic	`bifrost classic`	Cloudera CDP 7.x	Open-source Apache Hadoop	In-place package and configuration swap	Around 4-6 months for a multi-cluster estate
Modernize	`bifrost modernize`	Open-source Hadoop (any distribution)	Ilum lakehouse on Kubernetes	Phased component replacement with dual-read bridge	2 to 5 months
Direct	`bifrost direct`	Cloudera CDP 7.x	Ilum lakehouse on Kubernetes	Combined extraction and modernization in one engagement	2 to 5 months

All three paths share the same decision engine, validation framework, rollback model, and notification integrations. They differ in the source they read, the target they produce, and the set of migration phases they execute.

Choosing a Path

The right path depends on the commercial, regulatory, and architectural constraints of the engagement.

When to choose Classic

Classic is the right choice when:

The priority is eliminating Cloudera licensing cost, not changing the architecture.
Regulatory or organizational constraints prevent moving to Kubernetes in the near term.
A proven, low-risk, reversible migration with weekend cutover windows is required.
The Hadoop estate is heavily integrated with bare-metal infrastructure — local-disk HDFS, hardware-level rack awareness, custom kernel tuning.

Classic preserves the Hadoop architecture and operational model. Only the distribution changes.

When to choose Modernize

Modernize is the right choice when:

The estate already runs open-source Hadoop and is ready to move to Kubernetes.
A prior Classic replatform has been completed and the next modernization step is planned.
An existing Kubernetes footprint is in place and data workloads can be consolidated onto it.

Modernize replaces each Hadoop component with a Kubernetes-native equivalent using Apache Iceberg as the bridge format, the strangler-facade pattern for zero downtime, and table-by-table migration with instantaneous revert.

When to choose Direct

Direct is the right choice when:

The goal is to exit Cloudera CDP and arrive at Ilum on Kubernetes in a single program, without an intermediate Hadoop step.
The Hadoop estate is relatively modern (Spark-heavy, minimal MapReduce or Pig) and does not benefit from an intermediate open-source Hadoop step.
Running Cloudera during a two-step migration would be cost-prohibitive.

Direct combines the configuration extraction and discovery of Classic with the modernization pipeline of Modernize in one workflow.

Component Replacement Matrix

The following matrix applies to Modernize and Direct paths. Classic replaces Cloudera CDP with open-source equivalents of the same components (no architectural change).

Legacy component	Modern replacement	Automation ceiling
HDFS	S3A-compatible object storage (Ceph via RadosGW)	85–97 %
YARN	Kubernetes + Spark Operator + queue scheduler (YuniKorn)	95 %
Hive Metastore	Iceberg REST catalog (Apache Polaris or Gravitino)	70–90 %
Hive and Impala	Trino (dual-tier deployment behind Trino Gateway)	75–85 %
Oozie	Apache Airflow 3 with dbt and Cosmos	40–55 %
Apache Atlas	OpenMetadata	60–75 %
HBase	Apache Cassandra, ScyllaDB, or Cloud Bigtable	40–55 %
Apache Ranger	Open Policy Agent (OPA)	60–70 %
Livy	Ilum Livy-compatible proxy	99 %
Spark on YARN	Spark on Kubernetes via Ilum	85–99 %
ZooKeeper	etcd (Kubernetes-native) or ZooKeeper on Kubernetes	100 %
Solr (for Ranger / Atlas)	OpenSearch (for OpenMetadata)	80 %
Cloudera Manager monitoring	Prometheus and Grafana	85 %

The automation ceiling column represents the realistic upper bound of mechanical automation. The remaining percentage reflects work that requires human judgment: unmapped configuration properties, custom application code, application-specific access patterns, and bespoke integrations. Bifrost surfaces these items explicitly during discovery so they can be resourced ahead of execution.

For per-component migration details, see Per-component migration reference.

Paths Can Be Chained

Classic followed by Modernize is a supported sequence. Customers often execute Classic first to eliminate Cloudera licensing quickly, then run Modernize on the resulting open-source Hadoop estate at a later point. The end state is equivalent to a Direct engagement.

The sequencing choice is commercial, not technical. Classic followed by Modernize spreads the change and spend across two phases. Direct compresses both into one engagement at the cost of a longer single program.

Next Steps

Review the target platform architecture that Modernize and Direct produce.
Work through Getting started to install Bifrost and run a first discovery against a non-production cluster.
Consult the path-specific walkthroughs: Classic, Modernize, Direct.

The Three Paths​

Choosing a Path​

When to choose Classic​

When to choose Modernize​

When to choose Direct​

Component Replacement Matrix​

Paths Can Be Chained​

Next Steps​