Migration
For the business overview (drivers, cost models, case studies, ROI), see the Ilum Hadoop & Cloudera Migration Platform page. For the implementation reference once a migration is in flight, see the migration toolkit documentation, which covers discovery, phased execution, data validation, and rollback.
The rest of this page describes a high-level, self-directed migration path and covers product-version migration notes for Ilum itself, including the MongoDB to PostgreSQL metadata-store transition and the Konvert library pilot for data-conversion workloads.
Migration from Apache Hadoop to Ilum involves several steps, typically starting with the setup of the new environment, followed by the migration of data and applications, and finally testing and optimization. Here's a general outline of the process:
-
Preparation: Understand your current Hadoop deployment, including the data, applications, and dependencies it contains. Document all relevant details to ensure nothing is lost in the transition.
-
Setup Kubernetes Environment: Install and configure your Kubernetes cluster as per your organizational needs. This will serve as the foundation for your Ilum-managed Spark clusters.
-
Install Ilum: Deploy Ilum on your Kubernetes cluster using Helm, a package manager for Kubernetes. Ensure that Ilum is properly configured to manage your Spark clusters.
-
Data Migration: Begin migrating data from your Hadoop cluster to your new environment. This could involve moving data to a distributed file system accessible by your Kubernetes cluster, or to an S3 compatible storage system if that's part of your new architecture.
-
Application Migration: Migrate your Spark applications from the Hadoop environment to the new Kubernetes environment. This might involve changes to your applications to adapt them to the differences between Hadoop Yarn and Kubernetes.
-
Update Dependencies: Update any dependencies your applications have, such as changing data sources from HDFS to the new storage location.
-
Testing: Conduct thorough testing to ensure that your applications are running correctly in the new environment. This should include functional testing as well as performance testing to ensure your applications are performing at least as well as they did in the Hadoop environment.
-
Optimization: Based on your testing, optimize your Kubernetes and Ilum configurations for the best performance.
-
Monitoring: Once everything is migrated and optimized, continue monitoring your applications and infrastructure to ensure everything is working smoothly. Ilum provides a web interface that makes it easy to monitor your Spark clusters and jobs.
This is a high-level outline and the specifics will vary depending on your current Hadoop setup, your specific use cases, and the architecture of your new environment. It's also worth noting that migration can be a complex process, and it may be beneficial to work with experts or seek out detailed guides or resources to assist with the migration.
Migration Support
Transitioning from Apache Hadoop to a new environment managed by Ilum may seem challenging, but you are not alone in this process. We understand that migrating data and applications, setting up a new environment, and ensuring everything works as expected can be a complex task.
To assist you in this process, our team at Ilum is ready to provide comprehensive support. If you need help with setting up Ilum, migrating your Spark clusters, or any other aspect of the transition process, please feel free to reach out to us. We can provide a Helm chart for easy deployment of Ilum, and guide you through the steps needed to migrate your existing Hadoop cluster to the new environment.
We're committed to making the migration process as smooth as possible for you. Whether you have technical questions, need guidance on best practices, or encounter any issues during the migration, we're here to help.
Please contact us at [email protected] at any time for assistance with your migration to Ilum. Our dedicated support team is ready and eager to assist you in your journey towards efficient and manageable Apache Spark cluster management with Ilum.
Migration Notes
Migrating 5.*.* to 6.0.0
With the release of version 6.0.0, we have introduced new security implementation that require attention during the migration process. Existing user accounts must be recreated if any changes have been made to the default admin account.
Follow the steps below to successfully migrate to version 6.0.0. An example command creates two accounts: one for an admin and a second for a regular user.
helm upgrade \
--set ilum-core.security.internal.users[0].username=admin \
--set ilum-core.security.internal.users[0].password=adminPassword \
--set ilum-core.security.internal.users[0].roles[0]=ADMIN \
--set ilum-core.security.internal.users[1].username=user \
--set ilum-core.security.internal.users[1].password=userPassword \
--set ilum-core.security.internal.users[1].roles[0]=USER \
--reuse-values ilum ilum/ilum
To check all supported authentication methods and their parameters visit README.md files in ilum-core charts.
Migrating 6.0.* to 6.1.0
With the release of version 6.1.0, we introduced a new ilum spark storage implementation that requires attention during the migration process. The existing bucket configuration must be formatted to match new schema.
Previously s3 bucket used by ilum for storing spark resources was configured using the ilum-core.kubernetes.s3.bucket helm value. Since version 6.1.0 it has been replaced with two new parameters:
ilum-core.kubernetes.s3.sparkBucket- plays the same role as the previous parameterilum-core.kubernetes.s3.dataBucket- used to configure bucket for storing ilum-tables
Metadata store: MongoDB to PostgreSQL
Recent Ilum releases promote PostgreSQL to the primary metadata store for ilum-core. Access is reactive (R2DBC) with jOOQ-generated SQL DSL. MongoDB remains supported for legacy deployments and continues to receive bug fixes, but new deployments should default to PostgreSQL.
Why the change
- Consistency with the rest of the stack: Marquez, Hive Metastore, Airflow, Superset, MLflow, Hydra, Gitea, n8n, and Kestra already share PostgreSQL. Consolidating
ilum-coreremoves one stateful system from the deployment surface. - Schema-first metadata: jOOQ codegen produces type-safe SQL, replacing the schemaless reads against MongoDB collections that grew brittle as the metadata model expanded.
- Operational tooling: Standard Postgres backup, replication, and observability tooling applies to Ilum metadata without bespoke MongoDB pipelines.
Default configuration
PostgreSQL is enabled out of the box in the umbrella Helm chart (postgresql.enabled: true). MongoDB is also enabled by default for backwards compatibility. Operators can disable MongoDB once they have migrated:
mongodb:
enabled: false
Migrating an existing MongoDB-backed deployment
A migration tooling chain ships with Ilum (script set M001 through M009) that reads metadata from MongoDB and writes it to PostgreSQL in the new schema. The migration is run once during the upgrade window:
- Stop incoming traffic to
ilum-core(drain or scale to zero). - Verify a PostgreSQL deployment is reachable from the
ilum-corenamespace and theilumdatabase exists. - Run the migration job through the umbrella chart's migration runner. Each step (M001 through M009) runs sequentially; failures roll back to the prior checkpoint.
- Update the
ilum-coreconfiguration to point its primary store at PostgreSQL. - Restart
ilum-core. Verify clusters, jobs, schedules, and saved queries are present in the UI. - After a soak period, scale MongoDB down or disable it via
mongodb.enabled: false.
For deployment-specific migration assistance, contact [email protected].
Both backends in parallel
ilum-core supports running with either backend during the transition. The mongo.uri and PostgreSQL connection settings remain configurable independently, allowing operators to validate the PostgreSQL backend on a non-production cluster before promoting it.
Konvert library (pilot)
Ilum includes an integration with Konvert, a data-conversion library currently in pilot integration. Konvert is intended to streamline conversion of data and code between source formats and Ilum-native targets during migration projects (for example, transforming legacy ETL definitions into Ilum job specifications).
The pilot is opt-in and not yet covered by a stable public API. Teams interested in evaluating Konvert for a migration project should contact [email protected] for the current scope and an enablement walkthrough.
For large estate migrations from Hadoop or Cloudera CDP, the recommended starting point remains the Bifrost migration toolkit, which covers discovery, phased execution, data validation, and rollback.