Skip to main content

What is Ilum?

Modular Data Lakehouse for a Cloud Native World.

Welcome to the official documentation for Ilum, your comprehensive solution for managing and monitoring Apache Spark clusters. If you've ever sought a tool that could seamlessly handle multiple Spark clusters, whether they are deployed on the cloud or on-premise, your search ends here. No matter if your Spark setup uses Yarn or Kubernetes, Ilum simplifies your operations and brings them under one unified umbrella.

Ilum was initially developed to seamlessly manage and integrate Apache Spark on a Kubernetes environment. It has since matured into a comprehensive, modular Data Lakehouse Platform. In addition to its fundamental features, Ilum incorporates essential utilities like Jupyter, Apache Airflow, and MLflow, thereby creating a richer platform ecosystem. The system offers interactive capabilities and supports both Python and Scala programming languages. Moreover, it has the ability to turn Spark jobs into microservices, making your Data Lakehouse architecture more agile and responsive.

Ilum offers an interactive Spark session that can be managed through a REST API and a user-friendly web interface. This eliminates the need for executing commands via CLI or pouring over extensive logs to locate errors.

Moreover, Ilum sets a new standard in integrating Apache Spark with Kubernetes. It makes managing Spark jobs in Kubernetes as straightforward as possible, allowing you to connect to your Kubernetes cluster, submit Spark jobs, and monitor them with ease. In addition, you can effortlessly integrate Ilum with Apache Hadoop Yarn, making the tool versatile for various setups.

One of the significant advantages of Ilum is its REST API that lets you interact with Spark jobs. This feature opens up the possibility of building applications based on Apache Spark that can respond to user requests, adding a new layer of interactivity to your data operations. Ilum is also compatible with Jupyter, Zeppelin and Object Storage, expanding its scope of operations.

In essence, Ilum combines the capabilities of Apache Hadoop Yarn and Apache Livy, providing a holistic solution to cloud-native transformation.

Given its integration with Object Storage, Ilum presents itself as a modern alternative to Apache Hadoop, offering a more scalable, cost-effective, and efficient data storage solution for managing and monitoring Apache Spark clusters.

Watch video


  • Simple Cluster Setup: Ilum allows you to create and manage your own Apache Spark cluster with just a few commands. It can be installed in a few minutes, and it comes with auto-configuration features based on your existing Kubernetes cluster.
  • Unified Multi-Cluster Management: Irrespective of whether your Spark clusters are deployed on the cloud or on-premise, Ilum provides a unified platform for managing them all.
  • Scalability: Ilum is horizontally scalable. This means you can start with one node and seamlessly scale up to hundreds as your requirements grow.
  • Built-in S3 Compatible K8s Storage: Ilum provides built-in S3 compatible Kubernetes storage, allowing easy and flexible data storage solutions.
  • Interactive Spark Sessions: Ilum allows interaction with Spark jobs via REST API, enabling you to build applications that can respond to user requests in seconds.
  • Apache Hadoop Yarn Integration: Ilum can integrate seamlessly with Apache Hadoop Yarn, providing additional versatility.
  • Web Interface: Ilum comes with a user-friendly web interface for monitoring your Spark cluster and jobs.
  • Compatibility with other tools: Ilum integrates seamlessly with Jupyter, Zeppelin and Object Storage, allowing for a wider range of operations.
  • Error Monitoring: Ilum provides a streamlined process for locating and addressing errors, removing the need to sift through extensive logs.
  • Migration Support: Ilum can aid in the migration from Apache Hadoop to Kubernetes, providing an efficient pathway for transitioning to cloud-native operations.

Report your idea for a new feature here!



  • Multi-Environment Deployment: Ilum can be deployed in a variety of environments, whether they are on-premises, in the cloud, or a hybrid of both. This makes it adaptable to the unique needs and infrastructure of different organizations.
  • Avoiding Vendor Lock: Ilum is a standalone engine, meaning it's not tied to a specific cloud provider or infrastructure. This offers flexibility and avoids the risk of vendor lock-in, a concern with solutions like Databricks, which is closely tied to specific cloud providers. Cloudera can also lead to lock-in due to the significant integration and customization it often requires.
  • Unified Multi-Cluster Management: Unlike Databricks and Cloudera, Ilum allows you to control several Spark clusters from one place, regardless of whether they're deployed in the cloud or on-premises. This makes managing multiple clusters simpler.
  • Kubernetes and Hadoop Yarn Integration: Ilum is built with Kubernetes integration in mind, making it easy to run Spark on Kubernetes. While Databricks and Cloudera do support running Spark, they may not offer the same level of integration with Kubernetes. Ilum also integrates seamlessly with Apache Hadoop Yarn, providing flexibility based on your needs.
  • Migration Support: Ilum can aid in the migration from Apache Hadoop to Kubernetes, providing an efficient pathway for transitioning to cloud-native operations. This can be particularly advantageous if you're looking to move away from a tool like Cloudera.
  • Compatibility and Flexibility: Ilum's compatibility with both Kubernetes and Hadoop Yarn allows for flexibility in deployment and management of Apache Spark clusters. This is an advantage over some other tools that might not support both or may favor one over the other.


Explore Ilum's future roadmap to discover our exciting upcoming features and integrations here!