lakesight.io logoLakesight

Costs calculation

A quick guide on how Lakesight computes Databricks costs.

Last updated: May 2026

Lakesight calculates costs as accurately as possible from public list prices and data available through Databricks REST API, but will not replicate the bill to the penny. The goal is to provide a consistent cost signal to identify what costs most and drive optimization decisions.

Overview

All costs are calculated the same way based on the sum of two components:

VM cost

The underlying infrastructure to cloud provider.

Prices used by lakesight for these costs are coming from Azure public retail pricing page. They're based on region and retail rates.

Fetched daily to always reflect the latest available prices.

DBU cost

Databricks' pricing for the service provided.

Pricing depends on workspace tier, cloud provider and usage of photon, as described in Databricks official pricing page.

Additionally, depending on the type of Virtual Machine selected at cluster creation, the DBU/h price will vary.

How are costs computed

All costs are calculated the same way from cluster events available through Databricks REST API calls.

The following cluster events are tracked and used to define cost-segments to be priced individually according to the number of running workers:

  • CREATING
  • RUNNING
  • RESIZING
  • UPSIZE_COMPLETED
  • TERMINATING

The driver node is considered always on from RUNNING to TERMINATING events and is priced for that segment.

VM cost = nodes × duration (h) × price/h

DBU cost = nodes × duration (h) × DBU/h × tier price

Approximations

  • Between CREATING and RUNNING events — Workers not yet started — VM cost for driver only. No DBU charged.
  • During a cluster resizing — Worker count uncertain during resize — average is used as an estimate.
SegmentWorkersDur.VMDBU
CREATING → RUNNING14m$0.05$0.00
CREATING → TERMINATING(driver)132m$0.40$0.21
RUNNING → RESIZING15m$0.06$0.03
RESIZING → UPSIZE_COMPLETED1–63m$0.15$0.08
UPSIZE_COMPLETED → RESIZING61m$0.05$0.03
RESIZING → RESIZING4–61m$0.05$0.03
RESIZING → RESIZING2–42m$0.06$0.03
RESIZING → RESIZING1–23m$0.05$0.03
RESIZING → UPSIZE_COMPLETED1–63m$0.14$0.08
UPSIZE_COMPLETED → RESIZING61m$0.05$0.02
RESIZING → RESIZING4–61m$0.08$0.04
RESIZING → UPSIZE_COMPLETED4–94m$0.35$0.18
UPSIZE_COMPLETED → RESIZING91m$0.07$0.04
RESIZING → RESIZING5–91m$0.07$0.04
RESIZING → TERMINATING5–102m$0.23$0.12

Typical Databricks pricing

Lakesight uses the prices as documented in official Databricks public pricing page.

Jobs — Premium

$0.30

per DBU

Jobs — Standard *

$0.15

per DBU

Interactive — Premium

$0.55

per DBU

Interactive — Standard *

$0.40

per DBU

As documented in Databricks public pricing page, Photon usage will result in DBU emission rates 2.5x for jobs and 2x for interactive clusters vs. non-Photon.

* Standard tier will be deprecated on October 1st 2026.

What is tracked

Job clusters

Job clusters are automatically created when a job is launched. One job equals one unique cluster created when the job starts and deleted when it ends or fails.

Lakesight calculates costs of all jobs fetched through Databricks REST API and displays them in various charts to allow better understanding of what is costing most.

A job cluster having a series of cluster events but no TERMINATING yet represents a still-running job. Lakesight costs these clusters based on the available events and highlights them in a dedicated section for a near real-time overview of running jobs.

Interactive clusters

Interactive clusters are clusters manually created from the Compute section of a Databricks workspace. Their ID remains the same for the lifetime of the cluster, until it is deleted. Lakesight groups usage into sessions, where each session represents the period from the moment the cluster is started until it is stopped. Costs are calculated per session, and Lakesight displays a timeline of all sessions for each interactive cluster.

An interactive cluster with a CREATING event and no TERMINATING yet represents a running UI cluster. Lakesight costs these sessions based on the available events after the last CREATING event and highlights them for an overview of running interactive clusters at any time.

Current limitations

Lakesight aims to provide the most accurate cost picture possible with the data available. However, there are known gaps:

Negotiated DBU pricing

Potential negotiated Databricks DBU pricing cannot be taken into account, resulting in overestimation of actual DBU costs.

Instance pool idle costs

When clusters use instance pools, VMs kept warm in the pool consume resources even when no workload is running. This idle cost is not yet tracked.

Azure Reserved Instances

Pre-purchased reserved VM capacity is priced lower than on-demand. lakesight.io uses on-demand rates.

Serverless workloads

Databricks serverless compute does not expose cluster-level events. Serverless runs are detected and flagged but cannot be costed.

SQL warehouses

Databricks SQL warehouse costs are not currently included.

Questions or remarks on the methodology?

We'd love to hear from you if you think something could be improved.

Contact us