Skip to main content

Cloudera CDP to Kubernetes Migration Procedure (Direct)

Direct is the one-hop path from Cloudera CDP to the Ilum lakehouse on Kubernetes. It combines Classic's configuration extraction and discovery capabilities with the Modernize pipeline, so customers exit Cloudera and arrive at Ilum in a single program — without an intermediate open-source Hadoop step.

When to Use Direct

Direct is appropriate when:

  • The goal is to exit Cloudera CDP and arrive at Ilum on Kubernetes in a single engagement.
  • Running Cloudera through a two-step (Classic then Modernize) migration is not acceptable, either financially or operationally.
  • The Hadoop estate is relatively modern — Spark-heavy, minimal MapReduce or Pig workloads — so an intermediate open-source Hadoop step adds no technical value.
  • The program timeline allows 2 to 5 months end to end.

Direct is the combination of Classic and Modernize, not a compromise between them. The outcome is the same as running Classic followed by Modernize, but delivered as one program.

How Direct Combines Classic and Modernize

Direct reuses Classic's Cloudera Manager discovery and extraction machinery, but instead of producing Hadoop XML templates aimed at an open-source Hadoop distribution, it maps Cloudera Manager properties directly to Kubernetes-native equivalents.

  • Cloudera Manager Spark service configurations become spark-defaults.conf entries for Spark on Kubernetes: spark.master=k8s://, resource requests and limits, S3A settings.
  • Cloudera Manager Hive and Impala service configurations become Trino connector properties on the target Trino deployment.
  • Cloudera Manager JDBC endpoints, queue configurations, and credential references become Airflow connection URIs.
  • Cloudera Manager policy exports become OPA policy files.
  • Cloudera Manager HDFS data directory layouts inform the object storage pool and bucket structure on the target.

The modernization pipeline — wave planning, dual-read bridge, table-by-table migration, validation, decommission — is identical to Modernize. Only the source-side adapter is different.

CDP-Specific Considerations

Migrating directly from a Cloudera CDP estate introduces several considerations that Modernize does not encounter.

Kerberos-aware data migration

DistCp must authenticate to the Kerberos-secured source HDFS. Bifrost captures keytabs during discovery and manages kinit automatically before any data migration task.

bifrost direct migrate-storage \
--source hdfs:///data/warehouse \
--target s3a://lakehouse/warehouse \
--method distcp \
--kerberos-principal [email protected] \
--kerberos-keytab /var/lib/bifrost/keytabs/bifrost.keytab

Cloudera parcel paths

Cloudera CDP uses /opt/cloudera/parcels/CDH/lib/ paths that differ from standard Hadoop layouts. The Direct configuration translator maps these to the appropriate object storage paths on the target platform.

Encryption zone key handover

CDP's Key Trustee Server (or KMS) manages HDFS encryption zone keys. KeyShell (invoked through hadoop key) does not expose an export verb — raw key material cannot be extracted through the CLI. Key handover therefore proceeds in two tracks:

  • Inventory. Bifrost runs hadoop key list during discovery to enumerate every encryption zone and its key name, and captures the KMS/KTS backing-store metadata (key names, versions, ACLs).
  • Re-key. Data is decrypted transparently through the HDFS client during DistCp (which uses a delegation token), then re-encrypted at the target using object-storage server-side encryption (SSE-S3 or SSE-KMS). The target key material is provisioned separately — either by exporting the backing KeyStore file (JCEKS / HSM-backed) or by re-wrapping EDEKs through a customer-managed KMS.

Re-encryption is a collaborative workstream with the customer's security team; Bifrost does not move raw key material across systems.

Cloudera-specific integrations

Direct discovery includes a CDP dependency analysis step that identifies Cloudera-specific integrations and flags them:

  • Navigator — audit and lineage trail; lineage rebuilds via OpenLineage on the target (see Component migrations — governance).
  • Workload XM — workload analytics; no direct equivalent. Replaced by Grafana dashboards + Trino query log analysis.
  • Cloudera Data Visualization — BI; migrated to Superset.
  • Ozone on CDP — object storage; migrates to the target object storage via DistCp with Ozone's ofs:// scheme.

The dependency analysis report lists every flagged integration with a recommended path. Items without a direct mapping are surfaced as manual-review work items.

Cloudera Manager shutdown

As the final step of the program, after all clusters are decommissioned and the silence periods have elapsed, Direct can shut down the Cloudera Manager server itself:

bifrost direct decommission \
--service cloudera-manager \
--cm-host cm.example.internal \
--confirm-irreversible

This step is irreversible and ends the Cloudera subscription requirement.

Phase Pipeline

Direct executes nine phases sequentially.

PhaseCommandDescription
0bifrost direct discoverCloudera Manager API extraction and estate scoring in one pass.
1bifrost direct planMigration plan, including CDP dependency analysis.
2bifrost direct landDeploy the Kubernetes target platform.
2.5bifrost direct validate-prePlatform pre-flight — Kerberos-aware variant of the Modernize check. Safe to re-run.
3bifrost direct bridgeDual-read bridge from CDP HDFS to target object storage (Kerberos-aware).
4bifrost direct migrate-waveTable-by-table migration waves.
5bifrost direct convert-workflowOozie to Airflow with dbt conversion.
6bifrost direct hue-importHUE to Superset migration.
7bifrost direct validateData-diff and query-parity validation.
8bifrost direct decommissionLegacy service shutdown, including Cloudera Manager.

Each command is described in CLI reference — Direct commands. Phases 4 through 7 run iteratively across waves; phases 0 through 3 run once.

Program Shape

Direct engagements typically follow this timeline:

MonthActivity
1Discovery, planning, CDP dependency analysis, target platform landing.
1 to 2Dual-read bridge, first waves of quick-win tables migrated.
2 to 4Wave-based table migration, workload conversion, HUE import.
2 to 4HBase migration track (runs in parallel with table migration).
4 to 5Silence period enforcement, legacy service decommission, Cloudera Manager shutdown.

The exact timeline depends on estate size, workload complexity, and the customer's change-management cadence. Bifrost's discovery output includes a TCO report that projects costs for both the legacy and the target platform across the program.

Interplay with Ilum Enterprise

Direct migrations land on a full Ilum Enterprise deployment. After the program ends, customers operate the platform using standard Ilum workflows.

Bifrost's role ends when the final legacy service is decommissioned and Cloudera Manager is shut down.

Next Steps