Hadoop Migration CLI Reference (Bifrost)
The Bifrost CLI (bifrost) is the single entry point for every migration operation. Every command translates to a consistent, auditable sequence of automation steps under the hood; the user-facing surface is a command tree organized by migration path.
Global Options
Usage: bifrost <path> <command> [options]
Paths:
classic Path 1: CDP to Open-Source Hadoop
modernize Path 2: Hadoop to Ilum on Kubernetes
direct Path 3: CDP directly to Ilum on Kubernetes
Global Options:
--cluster <name> Source cluster name (Classic paths, and the
decommission command under any path; matches
inventory directory)
--source-cluster <n> Source cluster name (Modernize and Direct paths,
except decommission)
--strict Promote WARN verdicts to non-zero exit (useful in CI)
--env <name> Target Kubernetes environment (dev / staging / production)
--dry-run Show planned actions without executing
--verbose Enable debug output
--log-file <path> Override log file location
--no-notify Suppress Slack and IT service management notifications
Every command accepts --help for inline usage, and every command respects --dry-run to preview actions without side effects.
Classic (Path 1) Commands
bifrost classic discover
Non-destructive discovery of a Cloudera CDP cluster. Connects to the Cloudera Manager API, inventories every host, service, configuration entry, and security asset, and produces YAML inventory and variable files.
bifrost classic discover \
--cluster PROD01 \
--cm-host cm.example.internal \
--cm-port 7183 \
--cm-user admin \
--cm-pass "$CM_PASSWORD" \
--output inventories/prod01/
Key flags:
| Flag | Purpose |
|---|---|
--cluster | Inventory directory name. |
--cm-host, --cm-port, --cm-user, --cm-pass | Cloudera Manager API credentials. |
--no-tls | For lab environments without a Cloudera Manager TLS certificate. |
--cm-timeout | Increase API timeout for clusters with >100 services (default 30 seconds). |
--output | Destination directory for discovery output. |
Output tree: see Getting started — first discovery run.
bifrost classic extract
Transforms the discovered Cloudera Manager configuration into standalone Hadoop XML templates. Filters Cloudera-internal properties, expands safety valves, and produces a manual-review report for unmapped entries.
bifrost classic extract \
--cluster PROD01 \
--output inventories/prod01/
Extraction runs in three stages — raw extraction, property translation, and template generation — and always closes with a canonicalization diff against rendered source XML. Any remaining differences are captured for manual approval.
bifrost classic validate-pre
Runs the full pre-flight checklist. Returns PROCEED, WARN, or ABORT. Non-destructive.
bifrost classic validate-pre --cluster PROD01
Checks include fsimage parseability against the target Hadoop version, edit log parseability, NameNode and DataNode layout versions, HBase HFile integrity, Hive Metastore schema validity, keytab validity, TLS certificate expiry, encryption zone documentation, reachable service ports, free disk space, presence of rollback assets, and a fsimage compatibility test on isolated test infrastructure. A detailed breakdown is in Classic migration — phase pipeline.
bifrost classic migrate
Executes the full migration pipeline. Two strategies are available; pick based on cluster size.
# Stop-and-swap (for clusters with fewer than 30 nodes)
bifrost classic migrate \
--cluster PROD01 \
--strategy stop-and-swap \
--dry-run
bifrost classic migrate \
--cluster PROD01 \
--strategy stop-and-swap \
--execute
# Shrink-and-grow (for larger production clusters)
bifrost classic migrate \
--cluster PROD02 \
--strategy shrink-and-grow \
--batch-size 10 \
--execute
Flag details:
| Flag | Purpose |
|---|---|
--strategy | stop-and-swap or shrink-and-grow. |
--batch-size | Number of DataNodes processed per wave in shrink-and-grow. |
--execute | Required to run the destructive phases (without it, the command only plans). |
bifrost classic validate-post
Comprehensive post-migration validation against the pre-migration baseline.
bifrost classic validate-post \
--cluster PROD01 \
--baseline /var/lib/bifrost/baselines/prod01/
Validates HDFS write-read cycle, HDFS fsck cleanliness, block count parity, HBase meta scan, Hive table count parity, Kafka consumer offsets, Kerberos authentication, policy enforcement, encryption zone accessibility, and performance benchmarks (TeraSort, TestDFSIO). Warns on regressions greater than 20 %.
bifrost classic rollback
Available at any point before bifrost classic finalize.
# Rollback to a specific phase
bifrost classic rollback --cluster PROD01 --to-phase backup
# Rollback a specific node (shrink-and-grow scenarios)
bifrost classic rollback --cluster PROD01 --node dn-042.example.internal
See Validation and rollback for the complete rollback model.
bifrost classic finalize
Irreversible. Removes the distribution package cache, snapshots, and all rollback assets.
bifrost classic finalize --cluster PROD01 --confirm-irreversible
--confirm-irreversible is required. Bifrost recommends a 5-business-day soak of clean operation before finalize.
Classic day-2 operations
Bifrost continues to manage the cluster after migration:
# Rolling patch across the cluster
bifrost classic patch --cluster PROD01 --rolling --batch-size 10
# Decommission a node
bifrost classic decommission --cluster PROD01 --host worker-042.example.internal
# Add a node to the cluster
bifrost classic add-node --cluster PROD01 --host worker-120.example.internal --role datanode
# Rotate Kerberos keytabs
bifrost classic rotate-keytabs --cluster PROD01
# Renew TLS certificates
bifrost classic renew-certificates --cluster PROD01
The Modernize and Direct paths also expose a source-side Kerberos keytab refresh for long-running migrations against a Kerberized source HDFS:
bifrost modernize rotate-keytabs --source-cluster PROD01
bifrost direct rotate-keytabs --source-cluster PROD01
This refreshes the controller-side keytab cache and re-authenticates in-flight DistCp runs. It does not modify the source cluster.
Modernize (Path 2) Commands
bifrost modernize discover
Inventories the source Hadoop estate and produces a scored migration plan.
bifrost modernize discover --source-cluster PROD01
Output tree: see Getting started — first discovery run.
bifrost modernize plan
Generates or updates the phased migration plan.
# Generate an initial plan
bifrost modernize plan \
--source-cluster PROD01 \
--target-env production \
--waves 8 \
--max-parallel 8
# View the current plan
bifrost modernize plan --show
# Move a specific table between waves
bifrost modernize plan \
--move-table production_db.customers \
--to-wave 1
bifrost modernize land
Deploys the complete Kubernetes target platform via Helm.
# Deploy with the production profile
bifrost modernize land \
--env production \
--profile production \
--values modernize/helm/values-production.yaml
# Deploy a minimal dev environment
bifrost modernize land --env dev --profile dev
# Check platform health after deployment
bifrost modernize land --status
The platform components that are provisioned are listed in Target platform architecture.
bifrost modernize validate-pre
Non-destructive pre-flight check for the target platform. Runs before any wave execution and verifies that the platform is healthy enough to accept migration traffic.
bifrost modernize validate-pre --env production
Checks performed, grouped by the phase they gate:
Gates Phase 2 (bridge) and Phase 3 (table migration) — always blocking:
- Object storage cluster in
HEALTH_OKand below the configured utilization threshold. - Catalog service reachable and accepting OAuth2 client_credentials.
- Trino interactive and ETL clusters responding to health probes.
- OIDC flow to Keycloak succeeds (tested via authorization-code server flow with a pre-seeded test user, and via client_credentials for service accounts; no interactive browser flow is exercised).
- OPA sidecar reachable; a Bifrost-supplied test-policy bundle is loaded and returns
allow=truefor the synthetic test principal. - Connectivity from Kubernetes pods to legacy HDFS (for DistCp).
- Container registry reachable for Spark, Trino, Airflow images.
Informational; gates Phase 4 (workflow conversion) only:
- Airflow scheduler heartbeat current; a test DAG can be parsed.
Returns PROCEED, WARN, or ABORT with the same semantics as the Classic pre-flight. See Validation and rollback — Decision Engine Gates.
bifrost modernize bridge
Configures the dual-read strangler facade.
# Enable the bridge
bifrost modernize bridge \
--source-cluster PROD01 \
--target-env production \
--sync-interval 1h
# Check bridge status
bifrost modernize bridge --status
# Disable the bridge (rolls back redirection rules via Helm upgrade + Trino reload)
bifrost modernize bridge --disable
bridge configures Trino with both legacy and modern catalogs, enables table redirection for migrated tables, starts a periodic DistCp warm-sync, and prepares Spark (via Ilum) to read from both the legacy and modern storage paths.
Key flags:
| Flag | Purpose |
|---|---|
--source-cluster | Source cluster inventory name. |
--target-env | Target Kubernetes environment (dev / staging / production). |
--sync-interval | Interval for the DistCp warm-sync loop. Accepts Go-duration strings (30m, 1h, 6h). Default 1h. |
--status | Report the bridge's current state and warm-sync lag without changing anything. |
--disable | Remove redirection rules via a Helm upgrade of the Trino catalog config, followed by a coordinator reload. Does not delete migrated tables. |
bifrost modernize migrate-table
Migrates a single table from Hive on HDFS to Iceberg on object storage.
# Safe, reversible, no data copy
bifrost modernize migrate-table \
--table production_db.customers \
--strategy snapshot
# Destructive in-place conversion, faster for large stable tables
bifrost modernize migrate-table \
--table production_db.transactions \
--strategy migrate
# Additive import with re-partitioning
bifrost modernize migrate-table \
--table production_db.events \
--strategy add_files \
--partition-spec "year(event_date)"
Strategy guidance: see Modernize migration — table migration strategies.
bifrost modernize migrate-catalog
Bulk catalog migration without copying data. Moves table references between catalogs.
bifrost modernize migrate-catalog \
--source-type HIVE \
--target-type REST \
--target-uri https://polaris.example.internal/api/catalog \
--identifier-file tables_wave3.txt \
--dry-run
bifrost modernize migrate-wave
Migrates every table in a wave concurrently.
# Migrate wave 3 with up to 8 tables in parallel
bifrost modernize migrate-wave --wave 3 --parallel 8
# Check wave status
bifrost modernize migrate-wave --wave 3 --status
bifrost modernize migrate-storage
Bulk data movement from HDFS to object storage.
# DistCp (default, for historical data)
bifrost modernize migrate-storage \
--source hdfs:///data/warehouse \
--target s3a://lakehouse/warehouse \
--method distcp \
--mappers 20 \
--bandwidth 500
# Continuous replication
bifrost modernize migrate-storage \
--source hdfs:///data/warehouse \
--target s3a://lakehouse/warehouse \
--method continuous
# Incremental sync
bifrost modernize migrate-storage \
--source hdfs:///data/warehouse \
--target s3a://lakehouse/warehouse \
--method distcp \
--update \
--delete
Accepted --method values:
| Value | Mechanism | Use case |
|---|---|---|
distcp | Hadoop DistCp with the magic committer | Bulk historical data |
continuous | Paxos-coordinated live-data replication (partner integration) | Continuous replication of live, changing data |
juicefs | JuiceFS bridge exposing object storage through the HDFS SDK | Applications that require native HDFS SDK compatibility |
ozone | Two-step migration via Apache Ozone (ofs:// scheme, then object-storage) | Legacy estates already running Ozone |
--mappers, --bandwidth, --update, --delete apply to the distcp method. --kerberos-principal and --kerberos-keytab apply to any method when the source HDFS is Kerberized.
The Bifrost CLI uses double-dash flags (--mappers, --bandwidth, --update, --delete) and translates them to the native DistCp form internally (-m, -bandwidth, -update, -delete). When tuning DistCp directly outside Bifrost, use the single-dash native form.
--bandwidth expresses the per-mapper throughput cap in MB/s and accepts a floating-point value (for example, --bandwidth 0.5). The cap is approximate. Fewer mappers at higher bandwidth per mapper usually outperforms many mappers against the same object storage endpoint.
bifrost modernize convert-workflow
Converts Oozie workflows to Airflow 3 DAGs and extracts Hive SQL into dbt models.
# Convert a single workflow
bifrost modernize convert-workflow \
--oozie-path /user/oozie/workflows/etl-daily-revenue \
--output-dir airflow/dags/
# Convert every workflow in a directory tree
bifrost modernize convert-workflow \
--oozie-path /user/oozie/workflows/ \
--output-dir airflow/dags/ \
--recursive
# Extract SQL into dbt models during conversion
bifrost modernize convert-workflow \
--oozie-path /user/oozie/workflows/etl-daily-revenue \
--output-dir airflow/dags/ \
--extract-dbt \
--dbt-project-dir dbt/lakehouse/
Action-type coverage: see Component migrations — workflows.
bifrost modernize hue-import
Migrates HUE saved queries and dashboards to Superset.
bifrost modernize hue-import \
--hue-db-host hue.example.internal \
--hue-db-name hue \
--hue-db-user hue \
--hue-db-pass "$HUE_DB_PASSWORD" \
--target-superset https://superset.example.internal \
--superset-admin-user admin \
--superset-admin-pass "$SUPERSET_PASSWORD" \
--user-mapping-file user_mapping.csv
Items that must be rebuilt manually are listed in Component migrations — user interface.
bifrost modernize validate
Runs validation for migrated tables.
# Validate a single table
bifrost modernize validate \
--table production_db.customers \
--type data-diff
# Validate every table in a wave
bifrost modernize validate \
--wave 3 \
--type data-diff \
--sample-rate 0.01
# Query parity test (compare legacy vs. modern results)
bifrost modernize validate \
--type query-parity \
--query-file benchmark_queries.sql \
--source-engine hive \
--target-engine trino
bifrost modernize rollback
Reverts one or more migrated tables to their pre-migration state. Invoked automatically by Bifrost when validation fails; the command is also available for operator-initiated reverts.
# Revert a single table
bifrost modernize rollback --table production_db.customers
# Revert every table in a wave (only after the wave is COMPLETE or FAILED)
bifrost modernize rollback --wave 3
--table and --wave are mutually exclusive. The revert mechanism depends on the migration strategy used for the table:
migratestrategy — the Iceberg table is dropped and the original Hive table is renamed back from<name>_BACKUP_to<name>. Metadata operation; microseconds.snapshotstrategy — the Iceberg table is dropped and table redirection is disabled. The source Hive table was never modified, so nothing is renamed back.add_filesstrategy — the imported files are removed from the Iceberg table's metadata; source files remain untouched.
Use --wave only after migrate-wave has finished (either COMPLETE or FAILED). While a wave is actively running, use --table per-table to avoid racing against in-flight migrations. Pause the wave with bifrost modernize migrate-wave --wave <n> --pause first if a bulk revert is required mid-wave.
bifrost modernize decommission
Decommissions a legacy service after a confirmed silence period.
# Decommission HDFS after 30 days of silence
bifrost modernize decommission \
--service hdfs \
--cluster PROD01 \
--after-silence 30d
# Dry-run decommission readiness check
bifrost modernize decommission \
--service yarn \
--cluster PROD01 \
--after-silence 30d \
--dry-run
Required flags by --service value:
| Service value | Required flags | Optional flags |
|---|---|---|
hdfs, yarn, hive, hbase, kafka, oozie, ranger, atlas, hue, solr | --cluster, --after-silence | --dry-run |
cloudera-manager (Direct only) | --cm-host, --confirm-irreversible | --dry-run |
--after-silence does not apply to the Cloudera Manager shutdown because there is no legacy data access to monitor after every cluster under it has already been decommissioned.
Note on cluster flags. The decommission command is service-scoped rather than path-scoped; it accepts --cluster under every path (Classic, Modernize, Direct), which is the single exception to the Global Options rule that Modernize and Direct use --source-cluster.
bifrost modernize status
Shows migration progress across storage, tables, workloads, and services.
bifrost modernize status
Example output:
Bifrost Modernize Status: PROD01 -> production
==================================================
Storage Migration: ################.... 78% (179 TB / 230 TB)
Table Migration: ##############...... 68% (2,847 / 4,187 tables)
Workload Conversion: ########............ 42% (156 / 371 workflows)
Service Replacement: ##################.. 88% (7 / 8 services)
Current Wave: 5 of 8
Tables in Progress: 23
Tables Blocked: 2 (manual review required)
Active Issues:
WARN production_db.risk_scores: 22 % slower on Trino (within tolerance)
TODO etl-risk-model coordinator: manual Airflow Dataset definition needed
BLOCK production_db.hbase_audit: HBase schema review pending (coprocessor detected)
Dashboard: https://superset.example.internal/bifrost-migration
Direct (Path 3) Commands
Direct combines Classic discovery and extraction with Modernize execution. The CLI namespace is bifrost direct, and every Modernize command is also aliased under Direct.
bifrost direct discover
Combines Cloudera Manager extraction with modernization estate scoring in a single pass.
bifrost direct discover \
--cm-host cm.example.internal \
--cm-port 7183 \
--cm-user admin \
--cm-pass "$CM_PASSWORD" \
--source-cluster PROD01 \
--output plans/prod01/
Key flags:
| Flag | Purpose |
|---|---|
--source-cluster | Name of the source CDP cluster; becomes the plan directory. |
--cm-host, --cm-port, --cm-user, --cm-pass | Cloudera Manager API credentials. Same semantics as classic discover. |
--no-tls | For lab environments where Cloudera Manager is not TLS-protected. |
--cm-timeout | Increase API timeout for clusters with more than 100 services (default 30 seconds). |
--output | Destination directory for discovery output. |
bifrost direct plan
Produces a migration plan that goes directly from CDP to the Kubernetes lakehouse. Includes a CDP dependency analysis step that identifies Cloudera-specific integrations and maps them to open alternatives or flags them for manual handling.
bifrost direct plan \
--source-cluster PROD01 \
--target-env production \
--waves 8 \
--max-parallel 8
bifrost direct land
Identical to bifrost modernize land. Deploys the Kubernetes target platform.
bifrost direct land --env production --profile production
bifrost direct migrate-wave
Executes the combined migration for every table in a wave. For each wave, Bifrost extracts the relevant table or service configuration from the source, translates it directly into Kubernetes-native equivalents (skipping an intermediate Hadoop XML representation), migrates data from source HDFS to object storage, converts Hive tables to Iceberg, and validates via data-diff.
# Migrate a specific wave
bifrost direct migrate-wave --wave 3 --parallel 8
# Migrate storage directly from CDP HDFS to object storage
bifrost direct migrate-storage \
--source hdfs:///data/warehouse \
--target s3a://lakehouse/warehouse \
--method distcp \
--kerberos-principal [email protected] \
--kerberos-keytab /var/lib/bifrost/keytabs/bifrost.keytab
Kerberos flags are required when migrating directly from a Kerberized CDP HDFS. Bifrost captures the keytab during discovery and manages kinit automatically. The same Kerberos flags are accepted by bifrost modernize migrate-storage when the legacy HDFS source is Kerberized.
Other Direct commands
Every Modernize command is available under Direct with identical syntax. For each, consult the Modernize documentation for flags and examples — only the command prefix changes from bifrost modernize to bifrost direct:
bifrost direct bridge --source-cluster PROD01 --target-env production --sync-interval 1h
bifrost direct migrate-table --table production_db.customers --strategy snapshot
bifrost direct convert-workflow --oozie-path /user/oozie/workflows/ --output-dir airflow/dags/ --recursive
bifrost direct hue-import --hue-db-host hue.example.internal --target-superset https://superset.example.internal
bifrost direct validate --wave 3 --type data-diff
bifrost direct decommission --service hdfs --cluster PROD01 --after-silence 30d
bifrost direct status
The only behavioural difference is that bifrost direct decommission can additionally shut down Cloudera Manager as the last step of the program:
bifrost direct decommission \
--service cloudera-manager \
--cm-host cm.example.internal \
--confirm-irreversible
Exit Codes and Notifications
Bifrost returns 0 on success, non-zero on any failure, and 2 specifically for ABORT verdicts from the decision engine.
Notifications are configured in a single YAML file that is read by every command. Bifrost supports three channels:
channels:
slack:
webhook: "{{ vault_slack_webhook }}"
channel: "#hadoop-migration"
events:
- phase_start
- phase_complete
- phase_failed
- gate_decision
- rollback_triggered
- table_migrated # Modernize / Direct only
- wave_complete # Modernize / Direct only
itsm:
type: bmc_helix # bmc_helix | servicenow | jira
api_url: "https://helix.example.internal/api/arsys/v1"
auth: jwt
events:
- migration_start
- migration_complete
- rollback_triggered
- incident_created
email:
smtp: "smtp.example.internal"
recipients: ["[email protected]"]
events:
- gate_decision
- rollback_triggered
--no-notify suppresses all channels for a single invocation, which is useful for dry runs and local testing. For Modernize and Direct paths, Bifrost also exposes Prometheus metrics and ships a Superset dashboard that visualizes live migration KPIs in real time. See Operations — monitoring and alerting.