Sql Server – Split column with delimited string into multiple columns

How to split a column with delimited string into multiple columns

Long time ago I found that data might come in many different formats, mostly related with the data source, Cellphone Numbers, User Names, addresses are some examples of data that commonly comes in delimited format.


You may want to split this delimited string columns and divide them into multiple columns for data analytics or maybe you want to split them to follow First Normal form, This is where this post is going to help you to see how to split this single delimited column into multiple ones (maintaining a certain order) by following this example.

There are multiple ways to ensure that you are going convert this single column into multiple columns with the requested information.

We are going to start by creating a table with some dummy data,

--CREATING TABLE
CREATE TABLE BOOKS(
ID VARCHAR(MAX) NULL
) 
/*
  BOOKS 
  PRODUCTID, NAME, PAGES, WEIGHT, SIZE, TYPE
*/
INSERT INTO BOOKS (ID)
VALUES('B001,INTRODUCTION TO SQL,500,100G,MID,TECH')
	  ,('B002,ADVANCED SQL SERVER PRACTICES,200,200G,BIG,TECH')
	  ,('B003,SQL SERVER PERFORMANCE,1000,500G,BIG,TECH')
	  ,('B004,SQL SERVER MANUAL,50,30G,SMALL,TECH')
          ,('B004,SQL SERVER MANUAL,50,30G,SMALL,TECH')

As you can see on the code, this query is going to create a table called Books and introduce data into it, this table contains books attributes separated by comma(,). So in this case what we want to do is to convert this single column into multiple by applying some functions.

Ways to Split delimited column into multiple rows

This function is available since Sql Server 2016 (quite new in my opinion), String Split is a table-valued function that splits a string into rows of substrings, based on a specific character. This function has the following syntax.

STRING_SPLIT (string, separator)

It looks easy but don’t forget that string split is a table-valued function this means that this function has to be applied on a table. as you can see in this example.

SELECT value 
FROM STRING_SPLIT('B001,INTRODUCTION TO SQL,500,100G,MID,TECH', ',');

But it can be applied for our example by using a CROSS APPLY function combined with our string split. On that way, we are going to be able to use this function in on our table.

SELECT ID
	  ,value 
FROM BOOKS BO
	CROSS APPLY STRING_SPLIT(ID, ',') AS BK

After applying our query a single column is converted into multiple rows, but that is not exactly what we want, yet.  The following step is going to complete the function by adding a row_number to distinguish between columns and by selecting those columns based on a row number.

/*PIVOTING TABLE, ASIGNING A RN TO COLUMNS AND GETTING BACK VALUES*/
WITH C AS(
SELECT ID
	  ,value 
	  ,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY (SELECT NULL)) as rn
FROM BOOKS BO
	CROSS APPLY STRING_SPLIT(ID, ',') AS BK
)
SELECT ID
	  ,[1] AS PRODUCTID
	  ,[2] AS NAME
	  ,[3] AS PAGES
	  ,[4] AS WEIGHT
	  ,[5] AS SIZE
	  ,[6] AS TYPE
FROM C
PIVOT(
	MAX(VALUE)
	FOR RN IN([1],[2],[3],[4],[5],[6])	
) as PVT

This function is going to return the following results.

As you can see our function worked and a single row now is divided in multiple rows, But one row is missing, and that row was a copy of column N°4 also something I was not expecting by using String_Split was the fact that the order is not guarantee. I found the following quote on it’s documentation.

“The order of the output may vary as the order is not guaranteed to match the order of the substrings in the input string”

Another way to do this is by applying XML and Nodes functions combined with a Cross Apply.

SELECT DISTINCT
    S.a.value('(/H/r)[1]', 'VARCHAR(100)') AS PRODUCTID
   ,S.a.value('(/H/r)[2]', 'VARCHAR(100)')  AS NAME
   , S.a.value('(/H/r)[3]', 'VARCHAR(100)') AS PAGES
   , S.a.value('(/H/r)[4]', 'VARCHAR(100)') AS WEIGHT
   , S.a.value('(/H/r)[5]', 'VARCHAR(100)') AS SIZE
   , S.a.value('(/H/r)[6]', 'VARCHAR(100)') AS TYPE
FROM
(
SELECT *
	 ,CAST (N'<H><r>' + REPLACE(ID, ',', '</r><r>')  + '</r></H>' AS XML) AS [vals]
FROM BOOKS
) d 
CROSS APPLY d.[vals].nodes('/H/r') S(a)

As you can see it gives the same result as the String_Split function, but the difference here is that the order is guaranteed.

This functions does not return duplicated values, so if you want those duplicated values into your query you can apply a row_number to make rows different.

This function is available since Sql Server 2016.

Another way to do this is by applying JSON_VALUE and VALUES functions combined with a Cross Apply as follows.

SELECT DISTINCT B.* 
FROM BOOKS A
CROSS APPLY (
      SELECT 
             JSON_VALUE(J,'$[1]') AS NAME
            ,JSON_VALUE(J,'$[2]') AS PAGES
            ,JSON_VALUE(J,'$[3]') AS WEIGHT
            ,JSON_VALUE(J,'$[4]') AS SIZE
            ,JSON_VALUE(J,'$[5]') AS TYPE
     FROM (VALUES ('["'+replace(replace(ID,'"','\"'),',','","')+'"]')) A(J)
) B

As you can see this is pretty similar than XML method, but more simpler.

This function uses charindex, right, left and len for splitting delimited columns.

CREATE FUNCTION dbo.GetColumnValue(
	@String VARCHAR(MAX),
	@Delimiter CHAR(1),
	@Column INT = 1
)
RETURNS VARCHAR(MAX)
AS     
BEGIN
DECLARE @idx INT
DECLARE @slice VARChar(MAX)     
SELECT @idx = 1     
    IF LEN(@String)<1 OR @String IS NULL
		RETURN NULL
DECLARE @ColCnt INT
    SET @ColCnt = 1
WHILE (@idx != 0)
BEGIN     
    SET @idx = CHARINDEX(@Delimiter,@String)     
    IF @idx!=0 
	BEGIN
        IF (@ColCnt = @Column) 
			RETURN LEFT(@String,@idx - 1)        
        SET @ColCnt = @ColCnt + 1
    END
    SET @String = RIGHT(@String,LEN(@String) - @idx)     
    IF LEN(@String) = 0 BREAK
END
RETURN @String  
END

The first parameter that you have to insert into the function is the delimited column that you want to divide, The second parameter is the delimiter that you have on the column and the last one is the number of string that you want to obtain from the delimited column.

After creating the function you can call it by using it in the SELECT statement as follows.

 
SELECT 
    DISTINCT dbo.GetColumnValue(ID, ',', 1) as PRODUCTID 
        ,dbo.GetColumnValue(ID, ',', 2) as NAME
        ,dbo.GetColumnValue(ID, ',', 3) as PAGES
	,dbo.GetColumnValue(ID, ',', 4) as WEIGHT
	,dbo.GetColumnValue(ID, ',', 5) as SIZE
	,dbo.GetColumnValue(ID, ',', 6) as TYPE
FROM BOOKS


This method works fine specially if you have to divide constantly delimited columns but it is less effective (in execution time) than executing the other methods.

For comparing our 4 queries for splitting delimited columns we are going to generate a demo table with 13 rows to split instead of 6 and 50000 rows by executing the following query.

/*CREATING DEMO TABLE*/
SELECT CONCAT_WS(',',ID,ID,abs(checksum(NewId()) % 10000000)) AS ID_TEST
INTO TEST_BOOKS
FROM books;
/*INSERTING DEMO RECORDS*/
DECLARE @cnt INT = 0;
WHILE @cnt &lt; 50000 
BEGIN
	INSERT INTO TEST_BOOKS(ID_TEST)
	SELECT  CONCAT_WS(',',ID,ID,abs(checksum(NewId()) % 10000000)) AS ID_TEST
	FROM books;
	
	SELECT @cnt = COUNT(*)
	FROM TEST_BOOKS;
END;

After creating the test table we are going to query the new table with each method we have.

We are going to obtain the following results for each case

String Split

String_Split method takes 4.8 seconds to run.

/*METHOD 1 STRING SPLIT*/
DBCC DROPCLEANBUFFERS; 

WITH C AS(
SELECT ID_TEST
      ,value 
      ,ROW_NUMBER() OVER(PARTITION BY ID_TEST ORDER BY (SELECT NULL)) as rn
FROM TEST_BOOKS BO
    CROSS APPLY STRING_SPLIT(ID_TEST, ',') AS BK
)
SELECT 
       [1] AS TEST_1
      ,[2] AS TEST_2
      ,[3] AS TEST_3
      ,[4] AS TEST_4
      ,[5] AS TEST_5
      ,[6] AS TEST_6
      ,[7] AS TEST_7
      ,[8] AS TEST_8
      ,[9] AS TEST_9
      ,[10] AS TEST_10
      ,[11] AS TEST_11
      ,[12] AS TEST_12
      ,[13] AS TEST_13
FROM C
PIVOT(
    MAX(VALUE)
    FOR RN IN([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13])  
) as PVT

XML

XML String Split method takes 6.09 minutes to run.

/*METHOD 2 XML*/
DBCC DROPCLEANBUFFERS; 

SELECT DISTINCT
    S.a.value('(/H/r)[1]', 'VARCHAR(100)') AS TEST_1
   ,S.a.value('(/H/r)[2]', 'VARCHAR(100)')  AS TEST_2
   , S.a.value('(/H/r)[3]', 'VARCHAR(100)') AS TEST_3
   , S.a.value('(/H/r)[4]', 'VARCHAR(100)') AS TEST_4
   , S.a.value('(/H/r)[5]', 'VARCHAR(100)') AS TEST_5
   , S.a.value('(/H/r)[6]', 'VARCHAR(100)') AS TEST_6
   , S.a.value('(/H/r)[7]', 'VARCHAR(100)') AS TEST_7
   , S.a.value('(/H/r)[8]', 'VARCHAR(100)') AS TEST_8
   , S.a.value('(/H/r)[9]', 'VARCHAR(100)') AS TEST_9
   , S.a.value('(/H/r)[10]', 'VARCHAR(100)') AS TEST_10
   , S.a.value('(/H/r)[11]', 'VARCHAR(100)') AS TEST_11
   , S.a.value('(/H/r)[12]', 'VARCHAR(100)') AS TEST_12
   , S.a.value('(/H/r)[13]', 'VARCHAR(100)') AS TEST_13
FROM
(
SELECT *
     ,CAST (N'<H><r>' + REPLACE(ID_TEST, ',', '</r><r>')  + '</r></H>' AS XML) AS [vals]
FROM TEST_BOOKS
) d 
CROSS APPLY d.[vals].nodes('/H/r') S(a);

JSON

Json method takes 9.8 Seconds to run.

/*METHOD 3 JSON*/
DBCC DROPCLEANBUFFERS; 

SELECT DISTINCT B.* 
FROM TEST_BOOKS A
CROSS APPLY (
      SELECT
             JSON_VALUE(J,'$[1]') AS TEST_1
            ,JSON_VALUE(J,'$[2]') AS TEST_2
            ,JSON_VALUE(J,'$[3]') AS TEST_3
            ,JSON_VALUE(J,'$[4]') AS TEST_4
            ,JSON_VALUE(J,'$[5]') AS TEST_5
            ,JSON_VALUE(J,'$[6]') AS TEST_6
            ,JSON_VALUE(J,'$[7]') AS TEST_7
            ,JSON_VALUE(J,'$[8]') AS TEST_8
            ,JSON_VALUE(J,'$[9]') AS TEST_9
            ,JSON_VALUE(J,'$[10]') AS TEST_10
	    ,JSON_VALUE(J,'$[11]') AS TEST_11
            ,JSON_VALUE(J,'$[12]') AS TEST_12
	    ,JSON_VALUE(J,'$[13]') AS TEST_13
     FROM (VALUES ('["'+replace(replace(ID_TEST,'"','\"'),',','","')+'"]')) A(J)
) B

Defined Function

The defined function takes 49.97 seconds to run

/*METHOD 4 FUNCTION*/
DBCC DROPCLEANBUFFERS; 

SELECT
    DISTINCT dbo.GetColumnValue(ID_TEST, ',', 1) as TEST_1 
    ,dbo.GetColumnValue(ID_TEST, ',', 2) as TEST_2
    ,dbo.GetColumnValue(ID_TEST, ',', 3) as TEST_3
    ,dbo.GetColumnValue(ID_TEST, ',', 4) as TEST_4
    ,dbo.GetColumnValue(ID_TEST, ',', 5) as TEST_5
    ,dbo.GetColumnValue(ID_TEST, ',', 6) as TEST_6
    ,dbo.GetColumnValue(ID_TEST, ',', 7) as TEST_7
    ,dbo.GetColumnValue(ID_TEST, ',', 8) as TEST_8
    ,dbo.GetColumnValue(ID_TEST, ',', 9) as TEST_9
    ,dbo.GetColumnValue(ID_TEST, ',', 10) as TEST_10
    ,dbo.GetColumnValue(ID_TEST, ',', 11) as TEST_11
    ,dbo.GetColumnValue(ID_TEST, ',', 12) as TEST_12
    ,dbo.GetColumnValue(ID_TEST, ',', 13) as TEST_13
FROM TEST_BOOKS;

For this test I used a table with 50,000 rows (49,969 were unique), I cleared query cache to avoid automatic SQL query optimization, so each query started in similar condition than the others.

Without a doubt based on query performance string split combined with a pivot function is the best option for splitting columns, also JSON and Values function works fine but there performance time is double compared with string split on the other hand it is more readable and easy to use, please avoid using XML function and use as an option JSON and Values function  as its performance is much better and it is syntax is pretty similar.

Do you know another method for doing string split in MSSQL?


3 Comments

    1. Good question, but for all these cases you would have to create a method to count the quantity of rows that are going to be return and create a dynamic sql for invoking all of them.

Leave a Reply

Your email address will not be published. Required fields are marked *