Skip to main content

Hadoop Migration CLI Reference (Bifrost)

The Bifrost CLI (bifrost) is the single entry point for every migration operation. Every command translates to a consistent, auditable sequence of automation steps under the hood; the user-facing surface is a command tree organized by migration path.

Global Options

Usage: bifrost <path> <command> [options]

Paths:
classic Path 1: CDP to Open-Source Hadoop
modernize Path 2: Hadoop to Ilum on Kubernetes
direct Path 3: CDP directly to Ilum on Kubernetes

Global Options:
--cluster <name> Source cluster name (Classic paths, and the
decommission command under any path; matches
inventory directory)
--source-cluster <n> Source cluster name (Modernize and Direct paths,
except decommission)
--strict Promote WARN verdicts to non-zero exit (useful in CI)
--env <name> Target Kubernetes environment (dev / staging / production)
--dry-run Show planned actions without executing
--verbose Enable debug output
--log-file <path> Override log file location
--no-notify Suppress Slack and IT service management notifications

Every command accepts --help for inline usage, and every command respects --dry-run to preview actions without side effects.

Classic (Path 1) Commands

bifrost classic discover

Non-destructive discovery of a Cloudera CDP cluster. Connects to the Cloudera Manager API, inventories every host, service, configuration entry, and security asset, and produces YAML inventory and variable files.

bifrost classic discover \
--cluster PROD01 \
--cm-host cm.example.internal \
--cm-port 7183 \
--cm-user admin \
--cm-pass "$CM_PASSWORD" \
--output inventories/prod01/

Key flags:

FlagPurpose
--clusterInventory directory name.
--cm-host, --cm-port, --cm-user, --cm-passCloudera Manager API credentials.
--no-tlsFor lab environments without a Cloudera Manager TLS certificate.
--cm-timeoutIncrease API timeout for clusters with >100 services (default 30 seconds).
--outputDestination directory for discovery output.

Output tree: see Getting started — first discovery run.

bifrost classic extract

Transforms the discovered Cloudera Manager configuration into standalone Hadoop XML templates. Filters Cloudera-internal properties, expands safety valves, and produces a manual-review report for unmapped entries.

bifrost classic extract \
--cluster PROD01 \
--output inventories/prod01/

Extraction runs in three stages — raw extraction, property translation, and template generation — and always closes with a canonicalization diff against rendered source XML. Any remaining differences are captured for manual approval.

bifrost classic validate-pre

Runs the full pre-flight checklist. Returns PROCEED, WARN, or ABORT. Non-destructive.

bifrost classic validate-pre --cluster PROD01

Checks include fsimage parseability against the target Hadoop version, edit log parseability, NameNode and DataNode layout versions, HBase HFile integrity, Hive Metastore schema validity, keytab validity, TLS certificate expiry, encryption zone documentation, reachable service ports, free disk space, presence of rollback assets, and a fsimage compatibility test on isolated test infrastructure. A detailed breakdown is in Classic migration — phase pipeline.

bifrost classic migrate

Executes the full migration pipeline. Two strategies are available; pick based on cluster size.

# Stop-and-swap (for clusters with fewer than 30 nodes)
bifrost classic migrate \
--cluster PROD01 \
--strategy stop-and-swap \
--dry-run

bifrost classic migrate \
--cluster PROD01 \
--strategy stop-and-swap \
--execute

# Shrink-and-grow (for larger production clusters)
bifrost classic migrate \
--cluster PROD02 \
--strategy shrink-and-grow \
--batch-size 10 \
--execute

Flag details:

FlagPurpose
--strategystop-and-swap or shrink-and-grow.
--batch-sizeNumber of DataNodes processed per wave in shrink-and-grow.
--executeRequired to run the destructive phases (without it, the command only plans).

bifrost classic validate-post

Comprehensive post-migration validation against the pre-migration baseline.

bifrost classic validate-post \
--cluster PROD01 \
--baseline /var/lib/bifrost/baselines/prod01/

Validates HDFS write-read cycle, HDFS fsck cleanliness, block count parity, HBase meta scan, Hive table count parity, Kafka consumer offsets, Kerberos authentication, policy enforcement, encryption zone accessibility, and performance benchmarks (TeraSort, TestDFSIO). Warns on regressions greater than 20 %.

bifrost classic rollback

Available at any point before bifrost classic finalize.

# Rollback to a specific phase
bifrost classic rollback --cluster PROD01 --to-phase backup

# Rollback a specific node (shrink-and-grow scenarios)
bifrost classic rollback --cluster PROD01 --node dn-042.example.internal

See Validation and rollback for the complete rollback model.

bifrost classic finalize

Irreversible. Removes the distribution package cache, snapshots, and all rollback assets.

bifrost classic finalize --cluster PROD01 --confirm-irreversible

--confirm-irreversible is required. Bifrost recommends a 5-business-day soak of clean operation before finalize.

Classic day-2 operations

Bifrost continues to manage the cluster after migration:

# Rolling patch across the cluster
bifrost classic patch --cluster PROD01 --rolling --batch-size 10

# Decommission a node
bifrost classic decommission --cluster PROD01 --host worker-042.example.internal

# Add a node to the cluster
bifrost classic add-node --cluster PROD01 --host worker-120.example.internal --role datanode

# Rotate Kerberos keytabs
bifrost classic rotate-keytabs --cluster PROD01

# Renew TLS certificates
bifrost classic renew-certificates --cluster PROD01

The Modernize and Direct paths also expose a source-side Kerberos keytab refresh for long-running migrations against a Kerberized source HDFS:

bifrost modernize rotate-keytabs --source-cluster PROD01
bifrost direct rotate-keytabs --source-cluster PROD01

This refreshes the controller-side keytab cache and re-authenticates in-flight DistCp runs. It does not modify the source cluster.

Modernize (Path 2) Commands

bifrost modernize discover

Inventories the source Hadoop estate and produces a scored migration plan.

bifrost modernize discover --source-cluster PROD01

Output tree: see Getting started — first discovery run.

bifrost modernize plan

Generates or updates the phased migration plan.

# Generate an initial plan
bifrost modernize plan \
--source-cluster PROD01 \
--target-env production \
--waves 8 \
--max-parallel 8

# View the current plan
bifrost modernize plan --show

# Move a specific table between waves
bifrost modernize plan \
--move-table production_db.customers \
--to-wave 1

bifrost modernize land

Deploys the complete Kubernetes target platform via Helm.

# Deploy with the production profile
bifrost modernize land \
--env production \
--profile production \
--values modernize/helm/values-production.yaml

# Deploy a minimal dev environment
bifrost modernize land --env dev --profile dev

# Check platform health after deployment
bifrost modernize land --status

The platform components that are provisioned are listed in Target platform architecture.

bifrost modernize validate-pre

Non-destructive pre-flight check for the target platform. Runs before any wave execution and verifies that the platform is healthy enough to accept migration traffic.

bifrost modernize validate-pre --env production

Checks performed, grouped by the phase they gate:

Gates Phase 2 (bridge) and Phase 3 (table migration) — always blocking:

  • Object storage cluster in HEALTH_OK and below the configured utilization threshold.
  • Catalog service reachable and accepting OAuth2 client_credentials.
  • Trino interactive and ETL clusters responding to health probes.
  • OIDC flow to Keycloak succeeds (tested via authorization-code server flow with a pre-seeded test user, and via client_credentials for service accounts; no interactive browser flow is exercised).
  • OPA sidecar reachable; a Bifrost-supplied test-policy bundle is loaded and returns allow=true for the synthetic test principal.
  • Connectivity from Kubernetes pods to legacy HDFS (for DistCp).
  • Container registry reachable for Spark, Trino, Airflow images.

Informational; gates Phase 4 (workflow conversion) only:

  • Airflow scheduler heartbeat current; a test DAG can be parsed.

Returns PROCEED, WARN, or ABORT with the same semantics as the Classic pre-flight. See Validation and rollback — Decision Engine Gates.

bifrost modernize bridge

Configures the dual-read strangler facade.

# Enable the bridge
bifrost modernize bridge \
--source-cluster PROD01 \
--target-env production \
--sync-interval 1h

# Check bridge status
bifrost modernize bridge --status

# Disable the bridge (rolls back redirection rules via Helm upgrade + Trino reload)
bifrost modernize bridge --disable

bridge configures Trino with both legacy and modern catalogs, enables table redirection for migrated tables, starts a periodic DistCp warm-sync, and prepares Spark (via Ilum) to read from both the legacy and modern storage paths.

Key flags:

FlagPurpose
--source-clusterSource cluster inventory name.
--target-envTarget Kubernetes environment (dev / staging / production).
--sync-intervalInterval for the DistCp warm-sync loop. Accepts Go-duration strings (30m, 1h, 6h). Default 1h.
--statusReport the bridge's current state and warm-sync lag without changing anything.
--disableRemove redirection rules via a Helm upgrade of the Trino catalog config, followed by a coordinator reload. Does not delete migrated tables.

bifrost modernize migrate-table

Migrates a single table from Hive on HDFS to Iceberg on object storage.

# Safe, reversible, no data copy
bifrost modernize migrate-table \
--table production_db.customers \
--strategy snapshot

# Destructive in-place conversion, faster for large stable tables
bifrost modernize migrate-table \
--table production_db.transactions \
--strategy migrate

# Additive import with re-partitioning
bifrost modernize migrate-table \
--table production_db.events \
--strategy add_files \
--partition-spec "year(event_date)"

Strategy guidance: see Modernize migration — table migration strategies.

bifrost modernize migrate-catalog

Bulk catalog migration without copying data. Moves table references between catalogs.

bifrost modernize migrate-catalog \
--source-type HIVE \
--target-type REST \
--target-uri https://polaris.example.internal/api/catalog \
--identifier-file tables_wave3.txt \
--dry-run

bifrost modernize migrate-wave

Migrates every table in a wave concurrently.

# Migrate wave 3 with up to 8 tables in parallel
bifrost modernize migrate-wave --wave 3 --parallel 8

# Check wave status
bifrost modernize migrate-wave --wave 3 --status

bifrost modernize migrate-storage

Bulk data movement from HDFS to object storage.

# DistCp (default, for historical data)
bifrost modernize migrate-storage \
--source hdfs:///data/warehouse \
--target s3a://lakehouse/warehouse \
--method distcp \
--mappers 20 \
--bandwidth 500

# Continuous replication
bifrost modernize migrate-storage \
--source hdfs:///data/warehouse \
--target s3a://lakehouse/warehouse \
--method continuous

# Incremental sync
bifrost modernize migrate-storage \
--source hdfs:///data/warehouse \
--target s3a://lakehouse/warehouse \
--method distcp \
--update \
--delete

Accepted --method values:

ValueMechanismUse case
distcpHadoop DistCp with the magic committerBulk historical data
continuousPaxos-coordinated live-data replication (partner integration)Continuous replication of live, changing data
juicefsJuiceFS bridge exposing object storage through the HDFS SDKApplications that require native HDFS SDK compatibility
ozoneTwo-step migration via Apache Ozone (ofs:// scheme, then object-storage)Legacy estates already running Ozone

--mappers, --bandwidth, --update, --delete apply to the distcp method. --kerberos-principal and --kerberos-keytab apply to any method when the source HDFS is Kerberized.

The Bifrost CLI uses double-dash flags (--mappers, --bandwidth, --update, --delete) and translates them to the native DistCp form internally (-m, -bandwidth, -update, -delete). When tuning DistCp directly outside Bifrost, use the single-dash native form.

--bandwidth expresses the per-mapper throughput cap in MB/s and accepts a floating-point value (for example, --bandwidth 0.5). The cap is approximate. Fewer mappers at higher bandwidth per mapper usually outperforms many mappers against the same object storage endpoint.

bifrost modernize convert-workflow

Converts Oozie workflows to Airflow 3 DAGs and extracts Hive SQL into dbt models.

# Convert a single workflow
bifrost modernize convert-workflow \
--oozie-path /user/oozie/workflows/etl-daily-revenue \
--output-dir airflow/dags/

# Convert every workflow in a directory tree
bifrost modernize convert-workflow \
--oozie-path /user/oozie/workflows/ \
--output-dir airflow/dags/ \
--recursive

# Extract SQL into dbt models during conversion
bifrost modernize convert-workflow \
--oozie-path /user/oozie/workflows/etl-daily-revenue \
--output-dir airflow/dags/ \
--extract-dbt \
--dbt-project-dir dbt/lakehouse/

Action-type coverage: see Component migrations — workflows.

bifrost modernize hue-import

Migrates HUE saved queries and dashboards to Superset.

bifrost modernize hue-import \
--hue-db-host hue.example.internal \
--hue-db-name hue \
--hue-db-user hue \
--hue-db-pass "$HUE_DB_PASSWORD" \
--target-superset https://superset.example.internal \
--superset-admin-user admin \
--superset-admin-pass "$SUPERSET_PASSWORD" \
--user-mapping-file user_mapping.csv

Items that must be rebuilt manually are listed in Component migrations — user interface.

bifrost modernize validate

Runs validation for migrated tables.

# Validate a single table
bifrost modernize validate \
--table production_db.customers \
--type data-diff

# Validate every table in a wave
bifrost modernize validate \
--wave 3 \
--type data-diff \
--sample-rate 0.01

# Query parity test (compare legacy vs. modern results)
bifrost modernize validate \
--type query-parity \
--query-file benchmark_queries.sql \
--source-engine hive \
--target-engine trino

bifrost modernize rollback

Reverts one or more migrated tables to their pre-migration state. Invoked automatically by Bifrost when validation fails; the command is also available for operator-initiated reverts.

# Revert a single table
bifrost modernize rollback --table production_db.customers

# Revert every table in a wave (only after the wave is COMPLETE or FAILED)
bifrost modernize rollback --wave 3

--table and --wave are mutually exclusive. The revert mechanism depends on the migration strategy used for the table:

  • migrate strategy — the Iceberg table is dropped and the original Hive table is renamed back from <name>_BACKUP_ to <name>. Metadata operation; microseconds.
  • snapshot strategy — the Iceberg table is dropped and table redirection is disabled. The source Hive table was never modified, so nothing is renamed back.
  • add_files strategy — the imported files are removed from the Iceberg table's metadata; source files remain untouched.

Use --wave only after migrate-wave has finished (either COMPLETE or FAILED). While a wave is actively running, use --table per-table to avoid racing against in-flight migrations. Pause the wave with bifrost modernize migrate-wave --wave <n> --pause first if a bulk revert is required mid-wave.

bifrost modernize decommission

Decommissions a legacy service after a confirmed silence period.

# Decommission HDFS after 30 days of silence
bifrost modernize decommission \
--service hdfs \
--cluster PROD01 \
--after-silence 30d

# Dry-run decommission readiness check
bifrost modernize decommission \
--service yarn \
--cluster PROD01 \
--after-silence 30d \
--dry-run

Required flags by --service value:

Service valueRequired flagsOptional flags
hdfs, yarn, hive, hbase, kafka, oozie, ranger, atlas, hue, solr--cluster, --after-silence--dry-run
cloudera-manager (Direct only)--cm-host, --confirm-irreversible--dry-run

--after-silence does not apply to the Cloudera Manager shutdown because there is no legacy data access to monitor after every cluster under it has already been decommissioned.

Note on cluster flags. The decommission command is service-scoped rather than path-scoped; it accepts --cluster under every path (Classic, Modernize, Direct), which is the single exception to the Global Options rule that Modernize and Direct use --source-cluster.

bifrost modernize status

Shows migration progress across storage, tables, workloads, and services.

bifrost modernize status

Example output:

Bifrost Modernize Status: PROD01 -> production
==================================================

Storage Migration: ################.... 78% (179 TB / 230 TB)
Table Migration: ##############...... 68% (2,847 / 4,187 tables)
Workload Conversion: ########............ 42% (156 / 371 workflows)
Service Replacement: ##################.. 88% (7 / 8 services)

Current Wave: 5 of 8
Tables in Progress: 23
Tables Blocked: 2 (manual review required)

Active Issues:
WARN production_db.risk_scores: 22 % slower on Trino (within tolerance)
TODO etl-risk-model coordinator: manual Airflow Dataset definition needed
BLOCK production_db.hbase_audit: HBase schema review pending (coprocessor detected)

Dashboard: https://superset.example.internal/bifrost-migration

Direct (Path 3) Commands

Direct combines Classic discovery and extraction with Modernize execution. The CLI namespace is bifrost direct, and every Modernize command is also aliased under Direct.

bifrost direct discover

Combines Cloudera Manager extraction with modernization estate scoring in a single pass.

bifrost direct discover \
--cm-host cm.example.internal \
--cm-port 7183 \
--cm-user admin \
--cm-pass "$CM_PASSWORD" \
--source-cluster PROD01 \
--output plans/prod01/

Key flags:

FlagPurpose
--source-clusterName of the source CDP cluster; becomes the plan directory.
--cm-host, --cm-port, --cm-user, --cm-passCloudera Manager API credentials. Same semantics as classic discover.
--no-tlsFor lab environments where Cloudera Manager is not TLS-protected.
--cm-timeoutIncrease API timeout for clusters with more than 100 services (default 30 seconds).
--outputDestination directory for discovery output.

bifrost direct plan

Produces a migration plan that goes directly from CDP to the Kubernetes lakehouse. Includes a CDP dependency analysis step that identifies Cloudera-specific integrations and maps them to open alternatives or flags them for manual handling.

bifrost direct plan \
--source-cluster PROD01 \
--target-env production \
--waves 8 \
--max-parallel 8

bifrost direct land

Identical to bifrost modernize land. Deploys the Kubernetes target platform.

bifrost direct land --env production --profile production

bifrost direct migrate-wave

Executes the combined migration for every table in a wave. For each wave, Bifrost extracts the relevant table or service configuration from the source, translates it directly into Kubernetes-native equivalents (skipping an intermediate Hadoop XML representation), migrates data from source HDFS to object storage, converts Hive tables to Iceberg, and validates via data-diff.

# Migrate a specific wave
bifrost direct migrate-wave --wave 3 --parallel 8

# Migrate storage directly from CDP HDFS to object storage
bifrost direct migrate-storage \
--source hdfs:///data/warehouse \
--target s3a://lakehouse/warehouse \
--method distcp \
--kerberos-principal [email protected] \
--kerberos-keytab /var/lib/bifrost/keytabs/bifrost.keytab

Kerberos flags are required when migrating directly from a Kerberized CDP HDFS. Bifrost captures the keytab during discovery and manages kinit automatically. The same Kerberos flags are accepted by bifrost modernize migrate-storage when the legacy HDFS source is Kerberized.

Other Direct commands

Every Modernize command is available under Direct with identical syntax. For each, consult the Modernize documentation for flags and examples — only the command prefix changes from bifrost modernize to bifrost direct:

bifrost direct bridge --source-cluster PROD01 --target-env production --sync-interval 1h
bifrost direct migrate-table --table production_db.customers --strategy snapshot
bifrost direct convert-workflow --oozie-path /user/oozie/workflows/ --output-dir airflow/dags/ --recursive
bifrost direct hue-import --hue-db-host hue.example.internal --target-superset https://superset.example.internal
bifrost direct validate --wave 3 --type data-diff
bifrost direct decommission --service hdfs --cluster PROD01 --after-silence 30d
bifrost direct status

The only behavioural difference is that bifrost direct decommission can additionally shut down Cloudera Manager as the last step of the program:

bifrost direct decommission \
--service cloudera-manager \
--cm-host cm.example.internal \
--confirm-irreversible

Exit Codes and Notifications

Bifrost returns 0 on success, non-zero on any failure, and 2 specifically for ABORT verdicts from the decision engine.

Notifications are configured in a single YAML file that is read by every command. Bifrost supports three channels:

channels:
slack:
webhook: "{{ vault_slack_webhook }}"
channel: "#hadoop-migration"
events:
- phase_start
- phase_complete
- phase_failed
- gate_decision
- rollback_triggered
- table_migrated # Modernize / Direct only
- wave_complete # Modernize / Direct only

itsm:
type: bmc_helix # bmc_helix | servicenow | jira
api_url: "https://helix.example.internal/api/arsys/v1"
auth: jwt
events:
- migration_start
- migration_complete
- rollback_triggered
- incident_created

email:
smtp: "smtp.example.internal"
recipients: ["[email protected]"]
events:
- gate_decision
- rollback_triggered

--no-notify suppresses all channels for a single invocation, which is useful for dry runs and local testing. For Modernize and Direct paths, Bifrost also exposes Prometheus metrics and ships a Superset dashboard that visualizes live migration KPIs in real time. See Operations — monitoring and alerting.