Data Lineage
Lineage includes the data origin, what happens to it, and where it moves over time. Ilum package includes open source data lineage tools to improve visibility and simplify tracing errors back to the root cause in a data analytics process.
Marquez
Marquez is an open source metadata service. It maintains data provenance, shows how datasets are consumed and produced, provides global visibility into job runtimes, centralizes dataset lifecycle management, and much more. You can visit marquez website to get more details.
Please be aware, that Marquez is not bundled in ilum package by default.
You can enable marquez using a helm upgrade command. For instance:
helm upgrade \
--set global.lineage.enabled=true \
--set ilum-marquez.marquez.db.password="CHOOSE PASSWORD" \
--set postgresql.enabled=true \
--set postgresql.auth.username=ilum \
--set postgresql.auth.password="CHOOSE PASSWORD" \
--set postgresql.auth.database=marquez \
--set job.openLineage.transport.serverUrl=http://ilum-marquez:9555/api/v1/namespaces/ilum
--reuse-values ilum ilum/ilum
You can access Marquez UI from Ilum UI or using the port-forward command:
kubectl port-forward svc/ilum-marquez-web 9444:9444