Fastest way to delete massive records in Sql Server
When working with data one of the typical things to do is to delete records.
- Some reasons for deleting records are:
- Corrupted or useless Data
- Sensitive customer Data (Data that customer ask you to delete)
- Old Data (Usually more than 5 years old)
This task can be completed easily by using DELETE statement but when data have millions of records and relation with other tables.
- You can have the following scenarios.
- It takes some hours to delete requested records.
- It gives you timeout and does not delete any record.
- This is caused by the way delete statement works.
- Delete statement checks for foreign keys affecting records.
- Delete statement cause transaction log to be loaded.
- In some cases there could be triggers that will run after/before delete statement
- There could be many indexes attached to the records or missing indexes on foreign keys.
- A deadlock or a blocking on affected rows.
There are some options for dealing with this issue.
1. Demo Data
For this example we are going to generate a big dataset based on WideWorldImporters database on Sales.CustomerTransactions Table.
We have an initial count of 97,147 records on CustomerTransactions table, we are going to apply a while cycle for inserting that amount of data 49 times in our table making a total of 4,857,350 rows.
/*Making a Backup of CustomerTransactions table*/ SELECT * INTO Sales.BK_CustomerTransactions FROM Sales.CustomerTransactions /*Inserting backup into Sales.CustomerTransactions*/ DECLARE @COUNT INT = 0; WHILE (@COUNT < 49) BEGIN INSERT INTO Sales.CustomerTransactions (CustomerID ,TransactionTypeID ,InvoiceID ,PaymentMethodID ,TransactionDate ,AmountExcludingTax ,TaxAmount ,TransactionAmount ,OutstandingBalance ,FinalizationDate --,IsFinalized ,LastEditedBy ,LastEditedWhen) SELECT CustomerID ,TransactionTypeID ,InvoiceID ,PaymentMethodID ,TransactionDate ,AmountExcludingTax ,TaxAmount ,TransactionAmount ,OutstandingBalance ,FinalizationDate --,IsFinalized ,LastEditedBy ,LastEditedWhen FROM Sales.BK_CustomerTransactions; SET @COUNT = @COUNT + 1; END
After inserting data, records are going to be 4,857,350
For this demo we are going to delete records with TransactionTypeID = 1 affecting 3,525,500 records.
2. Delete in chunks in Sql Server
When dealing with big amounts of data one option is to use divide and conquer strategy.
In this case we are talking about diving the data in smaller chunks so Sql Server will be able to process them without giving timeout.
For example if we have a table with 4.8 million records, we are going to divide them in 50 parts so each time the loop executes, one chunk of 100,000 records are going to be deleted.
/*DEFINING VARIABLE TO COUNT NUMBER OF ROWS DELETED*/ DECLARE @NRO INT = 0; /*DEFINING WHILE FOR DELETING CHUNKS OF 100000 RECORDS*/ WHILE (@@ROWCOUNT > 0) BEGIN PRINT @@ROWCOUNT SET @NRO = @NRO + 1; PRINT CONCAT(100000 * @NRO , 'ROWS DELETED.') DELETE TOP (100000) FROM SALES.CustomerTransactions WHERE TransactionTypeID = 1; END
We are using
All records that had TransactionTypeId = 1 were deleted.
Delete took 2 minutes and 35 seconds.
3. Truncate and re-insert table data
Truncate and re-insert values in a table works like a charm when deleting or affecting a massive amount of records.
- When working with this method we have to follow the next steps:
- Store table into another table (Typically a temp table).
- Truncate table.
- Insert data from temp table filtering values that are going to be delete, into the final table.
- Proceed to delete temp table.
The advantage that we have when doing this operation comes related with the fact that truncate/insert statement does not load data to the transaction log (Unless you are in Full Recovery Mode), also with the fact that truncate does not verify referential integrity one by one when deleting data.
/*MAKING A BACKUP ON Sales.CustomerTransactions Table*/ SELECT * INTO ##BK_Sales_CustomerTransactions FROM Sales.CustomerTransactions; GO /*TRUNCATE Sales.CustomerTransactions TABLE */ TRUNCATE TABLE Sales.CustomerTransactions; /*INSERTING DATA FILTERING TRX TYPE 1*/ INSERT INTO Sales.CustomerTransactions ( CustomerTransactionID ,CustomerID ,TransactionTypeID ,InvoiceID ,PaymentMethodID ,TransactionDate ,AmountExcludingTax ,TaxAmount ,TransactionAmount ,OutstandingBalance ,FinalizationDate --,IsFinalized ,LastEditedBy ,LastEditedWhen) SELECT CustomerTransactionID ,CustomerID ,TransactionTypeID ,InvoiceID ,PaymentMethodID ,TransactionDate ,AmountExcludingTax ,TaxAmount ,TransactionAmount ,OutstandingBalance ,FinalizationDate --,IsFinalized ,LastEditedBy ,LastEditedWhen FROM ##BK_Sales_CustomerTransactions WHERE TransactionTypeID != 1;
Delete took 1 minute and 3 seconds
4. Performance comparison
For testing performance we are going to start by making a backup of Sales.CustomerTransactions table and deleting records with TransactionTypeID = 1.
SELECT * INTO ##bkSales_CustomerTransactions FROM Sales.CustomerTransactions; --After creating the table backup we are going to start testing the performance for the cases that we have. DELETE FROM Sales.CustomerTransactions WHERE TransactionTypeID = 1;
This query took 7 minutes and 41 seconds
Sometimes when deleting large amounts of data queries can timeout.
- As you can see in the examples bellow we have the following times for each case.
- Simple Delete. 7 Minutes and 41 Seconds.
- Delete in chunks. 2 minute and 35 seconds
- Truncate and Insert. 1 minute and 3 Seconds
Truncate and insert is 60 % faster than deleting records in chunks.
When deleting millions of records it is more effective to use the Truncate and Insert method, but there are certain cases where truncate is something impossible to do due constraints. In that case you can use delete in chunks method.