Tag Archives: XML

Grouping by Multiple Columns to Single Column as A String

This post is created in reference to question asked by Pinal Dave on SQLAuthority. So, If you have not gone through this post, I suggest to read that post first.

XML is good method to combine a string, but when we use the query as described in post and if some XML markup characters exists in the data, the values will be changed.  I am using the same sample data and query used in that post. Now if we insert row having > then it will be converted to >. So, we can use TYPE to create it as XML and then use value method to get it as a string datatype.

INSERT INTO TestTable (ID, Col)
SELECT 1, 'A>'

SELECT t.ID
    , STUFF( (SELECT ',' + s.Col FROM TestTable s WHERE s.ID = t.ID  
                   FOR XML PATH(''),type,root).value('root[1]','nvarchar(max)'),1,1,''
            ) AS CSV
FROM TestTable AS t
GROUP BY t.ID
GO

But this is not a full proof method, when some control char( between ASCII 1 to 31) exists in data, the xml serialization can fail and query will be aborted with error as shown below.

INSERT INTO TestTable (ID, Col)
SELECT 1, CHAR(1)

Now, if we run the above query, we get this error message “OR XML could not serialize the data for node ‘NoName’ because it contains a character (0x0001) which is not allowed in XML. To retrieve this data using FOR XML, convert it to binary, varbinary or image data type and use the BINARY BASE64 directiv” . But the query posted by Pinal Dave works successfully. Though it has some data which we do not want in final result. If you have legacy system, these chars might be there in data. Also, there are chances that some app is misbehaving and inserting this chars into data.

The alternative is to replace the control chars using some function. I created this function as suggested by Adam Machanic(blog) in this post‘s comment to use something similer to what he has done in sp_whoIsActive.

 if object_id('dbo.fn_removeControlCharacters') is not null
    drop function dbo.fn_removeControlCharacters
go
Create function dbo.fn_removeControlCharacters (@strValue nvarchar(1000))
returns nvarchar(1000)
as
begin
declare @rtStrValue nvarchar(1000)

SELECT @rtStrValue = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
                        (REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
                                (REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
                                        (REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
                                                (REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( 
                                                                                                                @strValue
                                                 ,NCHAR(1),N'?'),NCHAR(2),N'?'),NCHAR(3),N'?'),NCHAR(4),N'?'),NCHAR(5),N'?'),NCHAR(6),N'?')
                                         ,NCHAR(7),N'?'),NCHAR(8),N'?'),NCHAR(11),N'?'),NCHAR(12),N'?'),NCHAR(14),N'?'),NCHAR(15),N'?')
                                 ,NCHAR(16),N'?'),NCHAR(17),N'?'),NCHAR(18),N'?'),NCHAR(19),N'?'),NCHAR(20),N'?'),NCHAR(21),N'?')
                         ,NCHAR(22),N'?'),NCHAR(23),N'?'),NCHAR(24),N'?'),NCHAR(25),N'?'),NCHAR(26),N'?'),NCHAR(27),N'?')
                     ,NCHAR(28),N'?'),NCHAR(29),N'?'),NCHAR(30),N'?'),NCHAR(31),N'?');

return @rtStrValue 
end
go

Now, if we run below query, you can get the desired output.

SELECT t.ID
    , STUFF( (SELECT ',' + dbo.fn_removeControlCharacters(s.Col) FROM TestTable s 
    WHERE s.ID = t.ID  FOR XML PATH(''),type,root).value('root[1]','nvarchar(max)'),1,1,''
            ) AS CSV
FROM TestTable AS t
GROUP BY t.ID
GO

But, this is a workaround and the best method is to use CLR function for grouping string columns.