Hadoop and CDP Migration Paths: Decision Guide
Bifrost supports three migration paths. Each path has a distinct source, target, strategy, and timeline. Paths can be executed independently for a specific outcome, or chained (a Classic replatform followed by a Modernize engagement) for a progressive transition.
The Three Paths
| Path | CLI namespace | Source | Target | Strategy | Typical timeline |
|---|---|---|---|---|---|
| Classic | bifrost classic | Cloudera CDP 7.x | Open-source Apache Hadoop | In-place package and configuration swap | Around 4-6 months for a multi-cluster estate |
| Modernize | bifrost modernize | Open-source Hadoop (any distribution) | Ilum lakehouse on Kubernetes | Phased component replacement with dual-read bridge | 2 to 5 months |
| Direct | bifrost direct | Cloudera CDP 7.x | Ilum lakehouse on Kubernetes | Combined extraction and modernization in one engagement | 2 to 5 months |
All three paths share the same decision engine, validation framework, rollback model, and notification integrations. They differ in the source they read, the target they produce, and the set of migration phases they execute.
Choosing a Path
The right path depends on the commercial, regulatory, and architectural constraints of the engagement.
When to choose Classic
Classic is the right choice when:
- The priority is eliminating Cloudera licensing cost, not changing the architecture.
- Regulatory or organizational constraints prevent moving to Kubernetes in the near term.
- A proven, low-risk, reversible migration with weekend cutover windows is required.
- The Hadoop estate is heavily integrated with bare-metal infrastructure — local-disk HDFS, hardware-level rack awareness, custom kernel tuning.
Classic preserves the Hadoop architecture and operational model. Only the distribution changes.
When to choose Modernize
Modernize is the right choice when:
- The estate already runs open-source Hadoop and is ready to move to Kubernetes.
- A prior Classic replatform has been completed and the next modernization step is planned.
- An existing Kubernetes footprint is in place and data workloads can be consolidated onto it.
Modernize replaces each Hadoop component with a Kubernetes-native equivalent using Apache Iceberg as the bridge format, the strangler-facade pattern for zero downtime, and table-by-table migration with instantaneous revert.
When to choose Direct
Direct is the right choice when:
- The goal is to exit Cloudera CDP and arrive at Ilum on Kubernetes in a single program, without an intermediate Hadoop step.
- The Hadoop estate is relatively modern (Spark-heavy, minimal MapReduce or Pig) and does not benefit from an intermediate open-source Hadoop step.
- Running Cloudera during a two-step migration would be cost-prohibitive.
Direct combines the configuration extraction and discovery of Classic with the modernization pipeline of Modernize in one workflow.
Component Replacement Matrix
The following matrix applies to Modernize and Direct paths. Classic replaces Cloudera CDP with open-source equivalents of the same components (no architectural change).
| Legacy component | Modern replacement | Automation ceiling |
|---|---|---|
| HDFS | S3A-compatible object storage (Ceph via RadosGW) | 85–97 % |
| YARN | Kubernetes + Spark Operator + queue scheduler (YuniKorn) | 95 % |
| Hive Metastore | Iceberg REST catalog (Apache Polaris or Gravitino) | 70–90 % |
| Hive and Impala | Trino (dual-tier deployment behind Trino Gateway) | 75–85 % |
| Oozie | Apache Airflow 3 with dbt and Cosmos | 40–55 % |
| Apache Atlas | OpenMetadata | 60–75 % |
| HBase | Apache Cassandra, ScyllaDB, or Cloud Bigtable | 40–55 % |
| Apache Ranger | Open Policy Agent (OPA) | 60–70 % |
| Livy | Ilum Livy-compatible proxy | 99 % |
| Spark on YARN | Spark on Kubernetes via Ilum | 85–99 % |
| ZooKeeper | etcd (Kubernetes-native) or ZooKeeper on Kubernetes | 100 % |
| Solr (for Ranger / Atlas) | OpenSearch (for OpenMetadata) | 80 % |
| Cloudera Manager monitoring | Prometheus and Grafana | 85 % |
The automation ceiling column represents the realistic upper bound of mechanical automation. The remaining percentage reflects work that requires human judgment: unmapped configuration properties, custom application code, application-specific access patterns, and bespoke integrations. Bifrost surfaces these items explicitly during discovery so they can be resourced ahead of execution.
For per-component migration details, see Per-component migration reference.
Paths Can Be Chained
Classic followed by Modernize is a supported sequence. Customers often execute Classic first to eliminate Cloudera licensing quickly, then run Modernize on the resulting open-source Hadoop estate at a later point. The end state is equivalent to a Direct engagement.
The sequencing choice is commercial, not technical. Classic followed by Modernize spreads the change and spend across two phases. Direct compresses both into one engagement at the cost of a longer single program.
Next Steps
- Review the target platform architecture that Modernize and Direct produce.
- Work through Getting started to install Bifrost and run a first discovery against a non-production cluster.
- Consult the path-specific walkthroughs: Classic, Modernize, Direct.