Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
Deletion vectors accelerate DELETE, UPDATE, and MERGE operations on Delta Lake and Apache Iceberg tables. Without deletion vectors, modifying a single row requires rewriting the entire Parquet file containing that record. Deletion vectors mark rows as modified in metadata instead, and reads apply the deletion vector entries at query time to resolve the current table state.
Note
For predictive I/O updates, Photon uses deletion vectors to accelerate DELETE, MERGE, and UPDATE operations. See Use predictive I/O to accelerate updates.
Prerequisites
All Apache Iceberg v3 tables include deletion vectors by default. See Use Apache Iceberg v3 features. For Delta Lake tables, you must explicitly enable deletion vectors.
To write tables with deletion vectors using all optimizations, use Databricks Runtime 14.3 LTS and above. To read them, use Databricks Runtime 12.2 LTS and above.
In Databricks Runtime 14.2 and above, tables with deletion vectors support row-level concurrency. See Row-level concurrency.
Client compatibility
Azure Databricks uses deletion vectors to power predictive I/O for updates on Photon-enabled compute. See Use predictive I/O to accelerate updates.
Support for using deletion vectors for reads and writes varies by client.
The following table lists client versions required to read and write deletion vector tables:
| Client | Write deletion vectors | Read deletion vectors |
|---|---|---|
| Databricks Runtime with Photon | Supports MERGE, UPDATE, and DELETE using Databricks Runtime 12.2 LTS and above. |
Requires Databricks Runtime 12.2 LTS or above. |
| Databricks Runtime without Photon | Supports DELETE using Databricks Runtime 12.2 LTS and above. Supports UPDATE using Databricks Runtime 14.1 and above. Supports MERGE using Databricks Runtime 14.3 LTS and above. |
Requires Databricks Runtime 12.2 LTS or above. |
| OSS Apache Spark with OSS Delta Lake | Supports DELETE using OSS Delta 2.4.0 and above. Supports UPDATE using OSS Delta 3.0.0 and above. |
Requires OSS Delta 2.3.0 or above. |
| OpenSharing recipients | Writes are not supported on OpenSharing tables. | Azure Databricks requires Databricks Runtime 14.1 or above. Open source Apache Spark requires delta-sharing-spark 3.1 or above. |
For support with other clients, see the OSS Delta Lake integrations documentation.
Enable deletion vectors
In the workspace settings you can enable deletion vectors on new tables when you use a SQL warehouse or Databricks Runtime 14.3 LTS or above. Default settings vary by region, see Auto-enable deletion vectors.
Deletion vectors are not enabled by default for materialized views and streaming tables stored in Hive metastore.
To manually enable or remove deletion vectors on any table or view use the enableDeletionVectors table property.
To enable deletion vectors on a table when you create or alter a table:
Delta Lake
CREATE TABLE <table-name> [options] TBLPROPERTIES ('delta.enableDeletionVectors' = true);
ALTER TABLE <table-name> SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);
Iceberg table
CREATE TABLE <table-name> [options] TBLPROPERTIES ('iceberg.enableDeletionVectors' = true);
ALTER TABLE <table-name> SET TBLPROPERTIES ('iceberg.enableDeletionVectors' = true);
You can't use an ALTER statement to enable or remove deletion vectors on a materialized view or streaming table. You must use a CREATE TABLE statement.
Warning
When you enable deletion vectors, Databricks upgrades the table protocol. After upgrading, clients without deletion vector support can't read the table. See Delta Lake feature compatibility and protocols.
In Databricks Runtime 14.1 and above, you can drop the deletion vectors table feature to enable compatibility with other clients. See Drop a Delta Lake table feature and downgrade table protocol.
Apply soft-deletes to data files
Deletion vectors mark changes to rows as soft-deletes that logically modify existing Parquet data files in the table. To physically rewrite the Parquet data files, do one of the following:
- Run
OPTIMIZEon the table. - Run
REORG TABLE ... APPLY (PURGE)on the table. This command rewrites all data files containing records with deletion vector changes. See REORG TABLE. - Run a write with auto-compaction, which triggers a rewrite of a data file with a deletion vector.
File compaction events don't have strict guarantees for resolving changes recorded in deletion vectors. Some changes recorded in deletion vectors might not be physically applied if target data files are not candidates for file compaction.
Physically delete old data
Modified data might still exist in a table's old data files after a purge operation. You might want to physically remove the data, for example, to reduce storage costs with your cloud provider or to comply with GDPR requests.
To physically delete old data:
- Run
REORG TABLE ... APPLY (PURGE) - Run
VACUUMwith the retention threshold set to the purge completion timestamp to physically remove files from previous table versions. See Purge metadata-only deletes to force data rewrite.
Improve performance for large tables
To improve performance for purges on large tables, set spark.databricks.delta.reorg.purgeMode to rows.
For example, set this configuration when you purge data manually with REORG TABLE ... APPLY (PURGE) or when you remove deletion vectors with ALTER TABLE DROP FEATURE deletionVectors.
By default, spark.databricks.delta.reorg.purgeMode is set to all. On large tables, this operation might be slow because purge operations must scan all Parquet file footers to check for both dropped column data and soft-deleted rows.
The rows value limits the operation to handle only files with soft-deleted rows. On large tables, this might improve performance if many files don't contain soft-deleted rows and the table has no dropped columns.
Limitations
- UniForm Iceberg v2 doesn't support deletion vectors. Apache Iceberg v3 supports deletion vectors on tables with UniForm enabled. See Use Apache Iceberg v3 features.
- You cannot use a GENERATE statement to generate a manifest file for a table that has files using deletion vectors. To generate a manifest, first run a REORG TABLE … APPLY (PURGE) statement and then run the
GENERATEstatement. You must verify that no concurrent write operations are running when you submit theREORGstatement.- You cannot incrementally generate manifest files for a table with deletion vectors enabled (for example, by setting the table property
delta.compatibility.symlinkFormatManifest.enabled=true).
- You cannot incrementally generate manifest files for a table with deletion vectors enabled (for example, by setting the table property
- If you enable deletion vectors on a materialized view or Streaming table and subsequently remove deletion vectors, deletion vectors don't apply to future writes to the view or table, but existing deletion vectors remain.
- You cannot downgrade the table protocol after enabling deletion vectors on a materialized view or Streaming table, even if you subsequently turn off deletion vectors.