Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
Azure Databricks supports SQL standard DDL commands for dropping and replacing tables registered with either Unity Catalog or the Hive metastore. Drop and replace behavior differs by table type and metastore. Choose the right command to avoid data loss or concurrent operation failures.
When to drop a table
Databricks recommends that you use DROP TABLE to remove a table from the metastore when you want to permanently delete the table and have no intention of creating a new table in the same location. For example:
DROP TABLE table_name
DROP TABLE has different behaviors depending on the type of table and whether the table is registered to Unity Catalog or the legacy Hive metastore.
| Table type | Metastore | Behavior |
|---|---|---|
| Managed | Unity Catalog | The table is removed from the metastore and underlying data is marked for deletion. You can UNDROP a managed table within the configured recovery period (default 7 days). See Drop a managed table. |
| Managed | Hive | The table is removed from the metastore and the underlying data is deleted. |
| External | Unity Catalog | The table is removed from the metastore but the underlying data remains. URI access privileges are now governed by the external location that contains the data. |
| External | Hive | The table is removed from the metastore but the underlying data remains. Any URI access privileges are unchanged. |
Unity Catalog maintains a history of tables using an internal table ID. For all table types, after the drop operation completes, the previously registered table name no longer has an active link to data and table history from the metastore.
See DROP TABLE.
Note
Databricks doesn't recommend that you drop and then recreate a table using the same name for production pipelines or systems because this can result in unexpected results for concurrent operations. See Replace data with concurrent operations.
When to replace a table
Databricks recommends using CREATE OR REPLACE TABLE statements for use cases where you want to fully overwrite the target table with new data. For example, to overwrite a table with all data from a Parquet directory, run the following command:
CREATE OR REPLACE TABLE table_name
AS SELECT * FROM parquet.`/path/to/files`
CREATE OR REPLACE TABLE has the same semantics regardless of the table type or metastore in use. The following are important advantages of CREATE OR REPLACE TABLE:
- Table contents are replaced, but the table identity is maintained.
- The table history is retained, and you can revert the table to an earlier version with the
RESTOREcommand. - The operation is a single transaction, so there is never a time when the table doesn't exist.
- Concurrent queries reading from the table can continue without interruption. Because the version before and after replacement still exists in the table history, concurrent queries can reference either version of the table as necessary.
- If the original table included column masks, those masks are retained for any columns that still exist in the new table. This ensures that data access policies are preserved.
See CREATE TABLE [USING].
Replace data with concurrent operations
When you want to perform a full replacement of data in a table that might be used in concurrent operations, you must use CREATE OR REPLACE TABLE.
You shouldn't use the the following anti-pattern:
DROP TABLE IF EXISTS table_name;
CREATE TABLE table_name
AS SELECT * FROM parquet.`/path/to/files`;
For all table types, whether or not you're using Unity Catalog, using this pattern can result in an error, dropped records, or corrupted results.
Instead, Databricks recommends always using CREATE OR REPLACE TABLE, as in the following example:
CREATE OR REPLACE TABLE table_name
AS SELECT * FROM parquet.`/path/to/files`
Because atomic replacement preserves table history, concurrent transactions can validate the version of the source table they reference and fail or reconcile concurrent transactions without unexpected behavior.