Sql Server – Split column with delimited string into multiple columns

Split a column with delimited string into multiple columns

There are many reasons why you would like to split a delimited column into multiple columns, one of them is because on certain systems you will find data storage on a coma-delimited format or other kind of formats and you want to process that information. Cellphone Numbers, User Names or other kind of concatenations are present in Databases today.


Today we are going to see how to split this single delimited column into multiple ones (maintaining a certain order) by following a simple example.

There are multiple ways to ensure that you are going convert this single column into multiple columns with the requested information.

We are going to start by creating a table with some dummy data,

--CREATING TABLE
CREATE TABLE BOOKS(
ID VARCHAR(MAX) NULL
) 
/*
  BOOKS 
  PRODUCTID, NAME, PAGES, WEIGHT, SIZE, TYPE
*/
INSERT INTO BOOKS (ID)
VALUES('B001,INTRODUCTION TO SQL,500,100G,MID,TECH')
	  ,('B002,ADVANCED SQL SERVER PRACTICES,200,200G,BIG,TECH')
	  ,('B003,SQL SERVER PERFORMANCE,1000,500G,BIG,TECH')
	  ,('B004,SQL SERVER MANUAL,50,30G,SMALL,TECH')
          ,('B004,SQL SERVER MANUAL,50,30G,SMALL,TECH')

As you can see on the code, this query is going to create a table called Books and introduce data into it, this table contains books attributes separated by comma(,). So in this case what we want to do is to convert this single column into multiple by applying some functions.

Ways to Split delimited column into multiple rows

This function is available since Sql Server 2016 (quite new in my opinion), String Split is a table-valued function that splits a string into rows of substrings, based on a specific character. This function has the following syntax.

STRING_SPLIT (string, separator)

It looks easy but don’t forget that string split is a table-valued function this means that this function has to be applied on a table. as you can see in this example.

SELECT value 
FROM STRING_SPLIT('B001,INTRODUCTION TO SQL,500,100G,MID,TECH', ',');

But it can be applied for our example by using a CROSS APPLY function combined with our string split. On that way, we are going to be able to use this function in on our table.

SELECT ID
	  ,value 
FROM BOOKS BO
	CROSS APPLY STRING_SPLIT(ID, ',') AS BK

After applying our query a single column is converted into multiple rows, but that is not exactly what we want, yet.  The following step is going to complete the function by adding a row_number to distinguish between columns and by selecting those columns based on a row number.

/*PIVOTING TABLE, ASIGNING A RN TO COLUMNS AND GETTING BACK VALUES*/
WITH C AS(
SELECT ID
	  ,value 
	  ,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY (SELECT NULL)) as rn
FROM BOOKS BO
	CROSS APPLY STRING_SPLIT(ID, ',') AS BK
)
SELECT ID
	  ,[1] AS PRODUCTID
	  ,[2] AS NAME
	  ,[3] AS PAGES
	  ,[4] AS WEIGHT
	  ,[5] AS SIZE
	  ,[6] AS TYPE
FROM C
PIVOT(
	MAX(VALUE)
	FOR RN IN([1],[2],[3],[4],[5],[6])	
) as PVT

This function is going to return the following results.

As you can see our function worked and a single row now is divided in multiple rows, But one row is missing, and that row was a copy of column N°4 also something I was not expecting by using String_Split was the fact that the order is not guarantee. I found the following quote on it’s documentation.

“The order of the output may vary as the order is not guaranteed to match the order of the substrings in the input string”

Another way to do this is by applying XML and Nodes functions combined with a Cross Apply.

SELECT DISTINCT
    S.a.value('(/H/r)[1]', 'VARCHAR(100)') AS PRODUCTID
   ,S.a.value('(/H/r)[2]', 'VARCHAR(100)')  AS NAME
   , S.a.value('(/H/r)[3]', 'VARCHAR(100)') AS PAGES
   , S.a.value('(/H/r)[4]', 'VARCHAR(100)') AS WEIGHT
   , S.a.value('(/H/r)[5]', 'VARCHAR(100)') AS SIZE
   , S.a.value('(/H/r)[6]', 'VARCHAR(100)') AS TYPE
FROM
(
SELECT *,CAST (N'<H><r>' + REPLACE(ID, ',', '</r><r>')  
               + '</r></H>' AS XML) AS [vals]
FROM BOOKS) d 
CROSS APPLY d.[vals].nodes('/H/r') S(a) 

As you can see it gives the same result as the String_Split function, but the difference here is that the order is guaranteed.

This functions does not return duplicated values, so if you want those duplicated values into your query you can apply a row_number to make rows different.

1 Comment

Leave a Reply

Your email address will not be published. Required fields are marked *