Cloudera CDP to Kubernetes Migration Procedure (Direct)
Direct is the one-hop path from Cloudera CDP to the Ilum lakehouse on Kubernetes. It combines Classic's configuration extraction and discovery capabilities with the Modernize pipeline, so customers exit Cloudera and arrive at Ilum in a single program — without an intermediate open-source Hadoop step.
When to Use Direct
Direct is appropriate when:
- The goal is to exit Cloudera CDP and arrive at Ilum on Kubernetes in a single engagement.
- Running Cloudera through a two-step (Classic then Modernize) migration is not acceptable, either financially or operationally.
- The Hadoop estate is relatively modern — Spark-heavy, minimal MapReduce or Pig workloads — so an intermediate open-source Hadoop step adds no technical value.
- The program timeline allows 2 to 5 months end to end.
Direct is the combination of Classic and Modernize, not a compromise between them. The outcome is the same as running Classic followed by Modernize, but delivered as one program.
How Direct Combines Classic and Modernize
Direct reuses Classic's Cloudera Manager discovery and extraction machinery, but instead of producing Hadoop XML templates aimed at an open-source Hadoop distribution, it maps Cloudera Manager properties directly to Kubernetes-native equivalents.
- Cloudera Manager Spark service configurations become
spark-defaults.confentries for Spark on Kubernetes:spark.master=k8s://, resource requests and limits, S3A settings. - Cloudera Manager Hive and Impala service configurations become Trino connector properties on the target Trino deployment.
- Cloudera Manager JDBC endpoints, queue configurations, and credential references become Airflow connection URIs.
- Cloudera Manager policy exports become OPA policy files.
- Cloudera Manager HDFS data directory layouts inform the object storage pool and bucket structure on the target.
The modernization pipeline — wave planning, dual-read bridge, table-by-table migration, validation, decommission — is identical to Modernize. Only the source-side adapter is different.
CDP-Specific Considerations
Migrating directly from a Cloudera CDP estate introduces several considerations that Modernize does not encounter.
Kerberos-aware data migration
DistCp must authenticate to the Kerberos-secured source HDFS. Bifrost captures keytabs during discovery and manages kinit automatically before any data migration task.
bifrost direct migrate-storage \
--source hdfs:///data/warehouse \
--target s3a://lakehouse/warehouse \
--method distcp \
--kerberos-principal [email protected] \
--kerberos-keytab /var/lib/bifrost/keytabs/bifrost.keytab
Cloudera parcel paths
Cloudera CDP uses /opt/cloudera/parcels/CDH/lib/ paths that differ from standard Hadoop layouts. The Direct configuration translator maps these to the appropriate object storage paths on the target platform.
Encryption zone key handover
CDP's Key Trustee Server (or KMS) manages HDFS encryption zone keys. KeyShell (invoked through hadoop key) does not expose an export verb — raw key material cannot be extracted through the CLI. Key handover therefore proceeds in two tracks:
- Inventory. Bifrost runs
hadoop key listduring discovery to enumerate every encryption zone and its key name, and captures the KMS/KTS backing-store metadata (key names, versions, ACLs). - Re-key. Data is decrypted transparently through the HDFS client during DistCp (which uses a delegation token), then re-encrypted at the target using object-storage server-side encryption (SSE-S3 or SSE-KMS). The target key material is provisioned separately — either by exporting the backing KeyStore file (JCEKS / HSM-backed) or by re-wrapping EDEKs through a customer-managed KMS.
Re-encryption is a collaborative workstream with the customer's security team; Bifrost does not move raw key material across systems.
Cloudera-specific integrations
Direct discovery includes a CDP dependency analysis step that identifies Cloudera-specific integrations and flags them:
- Navigator — audit and lineage trail; lineage rebuilds via OpenLineage on the target (see Component migrations — governance).
- Workload XM — workload analytics; no direct equivalent. Replaced by Grafana dashboards + Trino query log analysis.
- Cloudera Data Visualization — BI; migrated to Superset.
- Ozone on CDP — object storage; migrates to the target object storage via DistCp with Ozone's
ofs://scheme.
The dependency analysis report lists every flagged integration with a recommended path. Items without a direct mapping are surfaced as manual-review work items.
Cloudera Manager shutdown
As the final step of the program, after all clusters are decommissioned and the silence periods have elapsed, Direct can shut down the Cloudera Manager server itself:
bifrost direct decommission \
--service cloudera-manager \
--cm-host cm.example.internal \
--confirm-irreversible
This step is irreversible and ends the Cloudera subscription requirement.
Phase Pipeline
Direct executes nine phases sequentially.
| Phase | Command | Description |
|---|---|---|
| 0 | bifrost direct discover | Cloudera Manager API extraction and estate scoring in one pass. |
| 1 | bifrost direct plan | Migration plan, including CDP dependency analysis. |
| 2 | bifrost direct land | Deploy the Kubernetes target platform. |
| 2.5 | bifrost direct validate-pre | Platform pre-flight — Kerberos-aware variant of the Modernize check. Safe to re-run. |
| 3 | bifrost direct bridge | Dual-read bridge from CDP HDFS to target object storage (Kerberos-aware). |
| 4 | bifrost direct migrate-wave | Table-by-table migration waves. |
| 5 | bifrost direct convert-workflow | Oozie to Airflow with dbt conversion. |
| 6 | bifrost direct hue-import | HUE to Superset migration. |
| 7 | bifrost direct validate | Data-diff and query-parity validation. |
| 8 | bifrost direct decommission | Legacy service shutdown, including Cloudera Manager. |
Each command is described in CLI reference — Direct commands. Phases 4 through 7 run iteratively across waves; phases 0 through 3 run once.
Program Shape
Direct engagements typically follow this timeline:
| Month | Activity |
|---|---|
| 1 | Discovery, planning, CDP dependency analysis, target platform landing. |
| 1 to 2 | Dual-read bridge, first waves of quick-win tables migrated. |
| 2 to 4 | Wave-based table migration, workload conversion, HUE import. |
| 2 to 4 | HBase migration track (runs in parallel with table migration). |
| 4 to 5 | Silence period enforcement, legacy service decommission, Cloudera Manager shutdown. |
The exact timeline depends on estate size, workload complexity, and the customer's change-management cadence. Bifrost's discovery output includes a TCO report that projects costs for both the legacy and the target platform across the program.
Interplay with Ilum Enterprise
Direct migrations land on a full Ilum Enterprise deployment. After the program ends, customers operate the platform using standard Ilum workflows.
- Ilum architecture — the target platform.
- Ilum production — production deployment guidance.
- Ilum security — authentication, authorization, and data access control.
- Ilum features — day-to-day platform features.
Bifrost's role ends when the final legacy service is decommissioned and Cloudera Manager is shut down.
Next Steps
- Review CLI reference — Direct commands.
- Understand validation and rollback for the combined pipeline.
- Work through per-component migration reference for each legacy system in scope.
- Prepare the production readiness checklist.