Skip to main content

Hadoop and Cloudera CDP Migration Documentation (Bifrost Toolkit)

Ilum Enterprise

Bifrost is a commercial component of Ilum Enterprise. To request access or discuss a migration engagement, contact [email protected] or reach out to your Ilum account representative.

For a business overview of Ilum's Hadoop and Cloudera migration offering — market drivers, cost models, case studies, and ROI — see the Ilum Hadoop & Cloudera Migration Platform page. This documentation set is the implementation reference for teams already executing a migration with Bifrost.

Bifrost is a purpose-built migration automation tool that bridges the gap between legacy Hadoop ecosystems and the Ilum lakehouse platform. It is designed to take enterprise customers through the full lifecycle of Hadoop modernization: from first discovery, through phased execution and validation, to decommissioning of legacy clusters.

Bifrost is not a long-running service or a web application. It is a migration framework driven through a command-line interface. Every operation is auditable, reproducible, and version-controlled. Bifrost automates the mechanical work and surfaces clear go-or-no-go signals at each gate. It does not attempt to replace human judgment at critical decision points, and it is explicit about what can and cannot be automated.

What Bifrost Does

Bifrost supports three migration paths that can be used independently or chained together:

  • CDP to open-source Hadoop — replatform Cloudera CDP clusters onto open-source Apache Hadoop in place, eliminating Cloudera licensing without changing the architecture.
  • Hadoop to Ilum on Kubernetes — modernize an existing open-source Hadoop estate onto the Ilum Kubernetes-native lakehouse, replacing legacy services component by component.
  • CDP directly to Ilum on Kubernetes — migrate Cloudera CDP straight to the Ilum platform in a single engagement, skipping the intermediate open-source Hadoop step.

Across all paths, Bifrost provides:

  • Automated discovery and inventory of the source estate.
  • Scored migration plans with wave assignments, complexity, and TCO projections.
  • Configuration translation from source cluster settings to target platform equivalents.
  • Data movement with bandwidth controls and resumable operations.
  • Structured validation (row-count parity, value-level data diffs, query parity, schema comparison).
  • Instantaneous revert at the table level during modernization, or full cluster rollback during in-place replatform.
  • Day-2 operations for patching, node lifecycle management, key rotation, and certificate renewal.
  • Integrated notifications for Slack and IT service management platforms.

Who Bifrost Is For

Bifrost is built for platform, data, and infrastructure teams who need to:

  • Exit Cloudera CDP licensing while preserving business continuity.
  • Modernize an open-source or commercial Hadoop deployment onto a Kubernetes-native lakehouse.
  • Consolidate data workloads from disparate on-premises clusters onto a unified platform.
  • Run a large migration as a structured, phased program rather than a single cutover event.

Customers typically have a mix of HDFS, YARN, Hive, Oozie, and Ranger today, and they want to land on the Ilum platform with Apache Iceberg tables, Trino, Apache Airflow, and cloud-native object storage.

For regulated customers in financial services, healthcare, and critical infrastructure, Bifrost is designed to align with EU DORA and NIS2, the US HIPAA Security Rule, NIST Cybersecurity Framework 2.0, SOC 2 Type II, ISO/IEC 27001:2022, UK and APAC operational-resilience regimes (FCA SYSC 15A, APRA CPS 230), Middle East frameworks (SAMA and NCA in Saudi Arabia; CBUAE, TDRA IAS, DFSA, and FSRA in the UAE; CBO in Oman; QCB and NIA in Qatar; CBB in Bahrain; CBK CORF in Kuwait), Indonesia OJK POJK 11/2022 and UU PDP, and further jurisdictions covered in the runners-up block. See Operations — Regulatory Compliance for the full control-to-obligation mapping.

How Bifrost Fits with Ilum

Bifrost is the migration front door. Ilum is the destination platform. Bifrost runs on a controller host outside the source cluster and orchestrates migration steps against the source environment and the target Kubernetes cluster where Ilum is deployed. After migration completes, Bifrost steps aside and the estate is operated using standard Ilum workflows.

+----------------------+        +---------------------+        +-------------------------+
| Legacy environment | | Bifrost | | Ilum Enterprise |
| (Hadoop / CDP) | ----> | migration toolkit | ----> | lakehouse on K8s |
| HDFS, YARN, Hive, | | (CLI + playbooks) | | Iceberg, Trino, Spark, |
| Oozie, Ranger, HUE | | | | Airflow, Superset, ... |
+----------------------+ +---------------------+ +-------------------------+

The target platform itself is described in the existing Ilum architecture documentation. Bifrost provisions and configures that platform as part of its migration workflow.

Design Principles

The following principles guide every Bifrost operation. They are the contract between the migration tool and the customer.

Auditable and reproducible

Every operation is driven by version-controlled inventories, playbooks, and scripts. Any run can be repeated with the same inputs and will produce the same results. Execution logs, decision verdicts, and rollback assets are persisted so that any step can be reviewed after the fact.

Configuration-as-data

Cluster-specific settings live in YAML inventories. The migration logic is generic. A single Bifrost installation drives migrations for any number of clusters; the only thing that changes between clusters is the inventory file. Changes to topology, credentials, or gates are edits to data, not code.

Phased with explicit go-or-no-go gates

Migration executes in discrete phases. At the end of each phase, a decision engine evaluates critical checks and returns one of three verdicts: PROCEED, WARN, or ABORT. Critical failures trigger automatic rollback without human intervention. Warnings are logged but do not block progression. Proceed verdicts allow the next phase to begin, typically with explicit human approval at production gates.

Honest automation ceilings

Bifrost tells customers up front what can be automated and what cannot. Every migration component has a documented automation ceiling, and Bifrost surfaces exactly which items require manual follow-up. There are no hidden gaps.

Zero data loss

Data directories on source nodes are never touched during in-place replatform. Object storage writes during modernization are idempotent and rely on the S3A magic committer. Row-count parity and value-level data diffs must pass before a migrated table is considered authoritative. Any detected mismatch aborts the migration.

Reversible until finalize

Every migration is reversible up to an explicit, irreversible final step. For in-place replatform, a full cluster rollback is available at any point before bifrost classic finalize. For modernization, the legacy path remains active until services are explicitly decommissioned after a silence period. Table-level revert during modernization is an atomic metadata operation and completes in microseconds.

Automation Ceilings

Bifrost is explicit about what it can automate. The "automation ceiling" for each component is the realistic upper bound of mechanical work Bifrost performs. Everything below the ceiling requires human judgment, application-specific knowledge, or manual effort.

BandTypical componentsWhat Bifrost doesWhat the customer does
85–95 %Storage, compute, monitoringFull lifecycle automationReview plans and approve gates
70–85 %Catalog, query engine, securityBulk automation with review queueReview edge cases, unmapped properties, custom configurations
40–55 %Oozie, HBase, HUEScaffolding, templates, partial automationCoordinator conversion, schema translation, dashboard rebuild
0–30 %Custom UDFs, HBase coprocessors, Pig scriptsDiscovery and annotated stubsActual rewrite is manual

Bifrost reports these bands in the pre-migration plan so that customers can resource the manual workstreams appropriately.

Scope Boundaries

Bifrost automates mechanical migration work. It does not attempt the following — these are separate workstreams the customer owns or the Ilum professional-services team addresses under a different engagement:

  • Application-level code changes. Bifrost generates reports of what needs changing; application teams do the work.
  • Custom UDF migration. UDFs are flagged during discovery; porting to Trino plugins or Spark is manual.
  • Data quality remediation. Surfaced in discovery; owned by data stewards as a separate workstream.
  • Multi-cloud orchestration and federation. Bifrost targets a single Kubernetes cluster; federating across clouds is out of scope.
  • Real-time streaming application logic. Kafka broker replacement is in scope; the streaming application code itself is not.
  • Vendor negotiations. License termination, hardware procurement, and network-provider changes are the customer's responsibility.
  • Impala migration under Classic. Impala is not part of the open-source Hadoop target; evaluate Trino as a separate track.
  • Atlas migration under Classic. Atlas continues on HBase during Classic; OpenMetadata adoption is a Modernize/Direct track.
  • Oozie-to-Airflow conversion under Classic. Classic preserves Oozie; workflow modernization happens in Modernize or Direct.

Further Reading