Cost metrics configuration
When observability is enabled, Ilum attributes the cost of every Spark run across the datasets the job actually touched. Each job detail view gains a Cost tab showing a per-table breakdown, and a top-level Cost route aggregates across jobs.
The breakdown is composed of cost dimensions. Some dimensions are built in and always tracked; others are counted request metrics that an operator defines, controlling which storage and RPC operations are counted toward a job's cost. This guide covers both and how to configure them.
Built-in cost dimensions
The following dimensions are always tracked for every traced job. They are derived from the per-stage metrics Spark already reports, so they require no configuration:
| Dimension | Unit | What it measures |
|---|---|---|
| Executor run time | seconds | Total executor wall-clock time, the basis for the modeled compute cost. |
| Input bytes | bytes | Bytes read by the job's stages. |
| Output bytes | bytes | Bytes written by the job's stages. |
| Remote shuffle read | bytes | Shuffle data fetched from other executors over the network. |
| Shuffle write | bytes | Shuffle data written during redistribution. |
| Disk spill | bytes | Data spilled to disk when a stage exceeded available memory. |
| GC time | milliseconds | Time spent in JVM garbage collection. |
Each dimension is apportioned across the datasets a stage read or wrote, so the Cost tab can answer "which table drove the I/O and compute on this run".
Counted request metrics
Beyond the built-in dimensions, an operator can define counted request metrics: metrics that count how many times the job invoked a particular set of storage or RPC operations. This is how object-storage request cost (which cloud providers bill per request, not only per byte) becomes visible per table.
A metric definition consists of:
- Key — a stable identifier (for example
s3.put_requests). The key becomes the column name in the Cost tab and the cost rollup. - Label and unit — display text shown in the UI (for example "S3 PUT requests",
requests). - Match type — which span attribute the metric reads. By default this is the RPC method name (
rpc.method), which carries the operation issued against object storage or a remote service. - Match values — the set of operations to count. Every operation in the run whose method is in this set adds one to the metric.
Ilum ships two such metrics by default, which an operator can keep, edit, or remove:
s3.put_requests— counts write operations:PutObject,CompleteMultipartUpload, andUploadPart.s3.get_requests— counts read operations:GetObject,ListObjectsV2, andHeadObject.
These illustrate the typical "writes" and "reads" pattern: a metric that groups the methods a provider uses for a logical action so the request count surfaces as a single number per table. The same approach extends to other providers — for example a metric matching a GCS or Azure write method vocabulary — without any code change, since a metric definition is pure configuration.
A metric with one or more match values is counted as a request counter (one increment per matching operation). A metric defined with no match values is instead treated as a byte gauge: the numeric value of the named attribute is summed rather than counted. The two built-in metrics above are request counters.
Configuring metrics in the UI
Counted request metrics are managed in the Cost Settings view, alongside the pricing rate card. To add or change a metric:
- Open the Cost Settings view.
- Add a metric definition, supplying a key, a label and unit for display, a match type (the span attribute to read, the RPC method by default), and the set of operations (match values) to count.
- Save. New and edited definitions apply to traced jobs going forward.
Changes made in the UI are persisted and take precedence over the seeded defaults — Ilum never overwrites them on a later upgrade.
Configuring metrics via Helm
To ship a different default set across a fresh deployment, define the metrics under the ilum-core chart's observabilityDefaults.metricDefinitions values. Each entry mirrors the fields above:
ilum-core:
observabilityDefaults:
metricDefinitions:
- key: "s3.put_requests"
label: "S3 PUT requests"
unit: "requests"
attribute: "rpc.method"
matchValues: ["PutObject", "CompleteMultipartUpload", "UploadPart"]
- key: "s3.get_requests"
label: "S3 GET requests"
unit: "requests"
attribute: "rpc.method"
matchValues: ["GetObject", "ListObjectsV2", "HeadObject"]
The Helm values seed the metric set only on first boot, when no settings have been saved yet. Once metrics exist (seeded or edited in the UI), the saved set is authoritative and subsequent Helm upgrades do not clobber it. To change metrics on a running deployment, edit them in the Cost Settings view rather than in Helm values.
Where configured metrics appear
Once defined, each metric becomes a column in the cost breakdown:
- In the Cost tab of every job detail view, each metric appears as a per-table value alongside the built-in dimensions, so a single table's executor time, bytes, and request counts sit on the same row.
- In the cost rollup that backs the cross-job Cost route, the metric key becomes a stored column, so the same counts aggregate across jobs over time.
Removing a metric stops it being counted on future runs; historical rows already written to the rollup are unaffected.