Skip to main content

MLflow

What is MLflow?

MLflow is a powerful platform for managing the machine learning lifecycle, including tracking experiments, managing model versions, and deploying models to production. In Ilum, MLflow is seamlessly integrated to provide advanced functionalities for data scientists and engineers. This documentation details the features, integration steps, and usage of MLflow within the Ilum ecosystem.

Features of MLflow

Experiment Tracking

  • Log parameters, metrics, and artifacts for every machine learning experiment.
  • Visualize and compare experiment results in a centralized UI.
  • Automatically store experiment metadata in Ilum's integrated storage solutions.

Model Registry

  • Manage models with version control.
  • Transition models through lifecycle stages: development, staging, and production.
  • Integrate directly with Ilum deployment pipelines.

Setting Up MLflow in Ilum

Helm Deployment

To enable MLflow in Ilum, use the following Helm command:

helm upgrade \   --set mlflow.enabled=true \    --reuse-values ilum ilum/ilum 

This command will:

  • Deploy the MLflow and MLflow tracking server.
  • Integrate MLflow with Ilum's storage and tracking services.
  • Enable MLflow Tracking UI and Model Registry.

Ilum

Using MLflow in Ilum

Configuration Setup

Global Configuration

To enable MLflow across the current Ilum instance:

  • Update the Spark image in the cluster settings to ilum/spark:3.5.3-mlflow or a newer version.

Python/Scala Jobs

Add the following parameter in the service configuration:

spark.kubernetes.container.image: ilum/spark:3.5.3-mlflow

Jupyter Notebook

Add the following parameter in the properties JSON under the conf array:

"spark.kubernetes.container.image": "ilum/spark:3.5.3-mlflow"

Regardless of the environment, set the Tracking URL to: http://ilum-mlflow-tracking:5000.

Tracking Experiments

MLflow tracking is preconfigured in Ilum. By providing the correct Tracking URL, all runs and experiments are automatically tracked.

  1. Log metrics and parameters in your machine learning scripts:
    import mlflow
    import mlflow.sklearn

    with mlflow.start_run():
    mlflow.log_param("param1", 5)
    mlflow.log_metric("accuracy", 0.85)
    mlflow.sklearn.log_model(model, "model")

Managing Models

  1. Register a model in the MLflow Model Registry:

    from mlflow.tracking import MlflowClient

    client = MlflowClient()
    run_uri = "runs:/<RUN_ID>/model"
    model_name = "dev.MLTeam.model-name"
    client.create_registered_model(model_name)
    mv_src = client.create_model_version(model_name, run_uri, "experiment")
  2. Assign aliases to model versions:

    client.set_registered_model_alias(model_name, "champion", "2")

Conclusion

MLflow's integration into Ilum provides a robust solution for managing machine learning workflows. From experiment tracking to model deployment, Ilum enhances MLflow's capabilities with seamless integration into its infrastructure. Use the steps and configurations outlined here to get started with MLflow in Ilum.