SSIS and Stored Procedures with Temp Tables in SQL 2012

In earlier versions of SQL Server, if we want to use stored procedures in OLEDB source command which contains temporary tables we take some workarounds mentioned below
1. http://www.sqlservercentral.com/articles/Integration+Services+%28SSIS%29/65112/
2. http://www.sqlservercentral.com/articles/Integration+Services+%28SSIS%29/61824/

In SQL 2012, we can include the result set definition to execute statement by using with result set syntax. By doing this we do not need to take workarounds described in above links. Below is the demo stored procedure

use AdventureWorks2012
go
if object_id('dbo.proc_testTempTableInSSIS','p') is not null
drop procedure dbo.proc_testTempTableInSSIS
go
Create procedure dbo.proc_testTempTableInSSIS
as
begin
create table #SSISTest ( c1 int, c2 nvarchar(100))

insert into #SSISTest (c1,c2)
select top 10 schema_id, type_desc from sys.all_objects

select c1
,c2
from #SSISTest
end
go

Now, we can use WITH RESULT SETS in OLEDB source as below

exec dbo.proc_testTempTableInSSIS
with result sets
(
(
c1 int,
c2  nvarchar(100)
)
);

image

Hope this helps you somewhere. Thanks for reading.

Distinct using Group By

Adding Distinct keyword hurt the query performance badly, but in real world we need to add Distinct may be because of not properly done normalization or may be we are working on a system which is intentionally designed that way. But, this post is not about that.

We use Distinct to get the unique rows , but we can achieve same goal by adding group by. Optimizer will create same plans for both in most of the cases . But, in case of some computation done on returned columns, the optimizer can choose a better plan for Group By compared to Distinct.

Lets have a look at example

--Create schema and dummy data
use tempdb
go
if object_id('dbo.addValues','fn') is not null
    drop function dbo.addValues
go
Create function dbo.addValues( @c1 smallint, @c2 smallint)
returns int
begin
    return @c1 + @c2
end
go
if object_id('dbo.dummyTable','u') is not null
    drop table  dbo.dummyTable
go
create table dbo.dummyTable
(
      c1 smallint
    , c2 smallint
)
go
insert into dbo.dummyTable(c1,c2)
select top (500000) sc.column_id
    ,sc.system_type_id
from master.sys.columns sc
cross apply master.sys.columns sc1
go
--Queries
use tempdb
go
select distinct c1,c2,cast(c1 as int)+c2 c3 from dbo.dummyTable
go
select distinct c1,c2,dbo.addValues(c1,c2) c3 from dbo.dummyTable
go
select c1,c2,cast(c1 as int)+c2 c3
from dbo.dummyTable group by c1,c2
go
select c1,c2,dbo.addValues(c1,c2) c3
from dbo.dummyTable group by c1,c2
go

If we run above queries, returned data will be same of each one. But, if we look at the execution plan, in case of distinct the sort is performed after the compute scalar, but in case of group by the sort is performed before compute scalar operator. This is gain in performance as in case of group by scalar function is executed less number of times.  For query with Distinct the scalar function gets executed each time for row, while in group by query scalar function gets executed only for distinct rows.  You can use profiler sp_count template or DMOs to confirm this. As a side note, this is the reason for why we should avoid scalar function if possible.

Distinct By Group By

Apart from the less number of executions, this will also impact the  memory grant of queries as the in case of query using group by, the aggregate operator  will need less memory. This can really helps in production environment where queries need to wait long for memory grant and also spill to tempdb can be avoided in some situations.

Distinct By Group By Memory Grant

Finally, I would like to thank Rob Farly’s this for this post, which I have refereed. 

Hope this helps you somewhere. Also, apart from performance gain, it can be a good coding practice to use Group BY in place of Distinct as we have one more change to look into column list to see we are returning any column which is not required in result set and if somebody using select * (which we should not use ever in production code), in group by column name needs to be specified.

Persisted Computed column behavior when arithabort is off

If we have persisted computed columns and the arithabort is set to ON, SQL needs to re-compute the column despite having column marked as computed. The rather bigger issue is the query always scans index, in spite of having proper index.

Let’s have a look at the example

--Create test data
use tempdb
go
if object_id('dbo.testPersistedComputedColumn') is not null
drop table dbo.testPersistedComputedColumn
go
Create table dbo.testPersistedComputedColumn
(
c1 int not null identity(1,1) primary key
, c2 char(4000)
, c3 as substring(c2,1,2) persisted
)
go

insert into dbo.testPersistedComputedColumn(c2)
select top 200 name from sys.all_columns
go

if object_id('dbo.testNonPersistedComputedColumn') is not null
drop table dbo.testNonPersistedComputedColumn
go
create table dbo.testNonPersistedComputedColumn
(
c1 int not null identity(1,1) primary key
, c2 char(4000)
, c3 as substring(c2,1,2)
)
go

insert into dbo.testNonPersistedComputedColumn(c2)
select top 200 name from sys.all_columns
go

Now, If you run the below query, first will perform scan while the second will seek the clustered index.

use tempdb
go
/*
Also need to set ansi_warning to off, as on database having compatibility
greater than 80, setting ansi_warnings ON automatically sets arithabort ON
and SSMS default setting have both these ON.
*/
set ansi_warnings off
go
set arithabort off
go
select tpc.c2 , tpc.c3
from dbo.testPersistedComputedColumn  tpc
where tpc.c1 = 1
go
select tpc.c2 , tpc.c3
from dbo.testNonPersistedComputedColumn   tpc
where tpc.c1 = 1
go

Below is the snap of execution plan for both queries. In case of persisted column clustered index scan is performed and filter is used to filtering rows, while second perform the index seek. Also, even in case of persisted column, value of C3 is calculated at run time.

Computed Column and Arithabort
This will not be issue for most of environments as apart from one which are using DB library,  all other client libraries have ansi_warnings to ON as a default and most of databases now days have database which have compatibilities grater than 80.

Remove the server name from MRU list from SSMS "connect to server" dialog

In SSMS 2005 and SSMS 2008 the method to clear the server name from “connect to server” dialog is to close SSMS and remove mru.dat ( in case of 2005) or SqlStudio.bin ( in case of 2008 and 2008 R2).  This method has been documented here.

Below are the default location of these files
C:\Documents and Settings\<USER>\Application Data\Microsoft\Microsoft SQL Server\90\Tools\Shell\mru.dat
C:\Documents and Settings\<USER>\Application Data\Microsoft\Microsoft SQL Server\100\Tools\Shell\SqlStudio.bin

The biggest problem of deleting this file is we also loose the server names which we do not want to remove.  In SSMS 2012 there is documented way to delete the server name from “connect to server” dialog. Now, we can remove the server name by selecting it and pressing DELETE. 

Now in SSMS “connect to server”  dialog, navigate to the particular server with keyboard

image

Press “DELETE”. Now this server will be removed from list.

image

 

Please note that when you again connect to the server you removed, it will again added to MRU list and you again need to delete it.

References:-

1. http://sqlblog.com/blogs/aaron_bertrand/archive/2009/03/18/managing-the-mru-list-for-the-connect-to-server-dialog-in-ssms.aspx

2. http://blogs.msdn.com/b/managingsql/archive/2011/07/13/deleting-old-server-names-from-quot-connect-to-server-quot-dialog-in-ssms.aspx

How Index_id are generated…

When we use index query hint, we can give the index number or index name. I use index hint for testing performance purpose only and almost never use it actual code. I tend to use index numbers in test code. But, till some time ago I use to wonder why the index id of newly created index is quite bigger, for one wide table the index_id of newly created index is 70 and We do not have these many indexes:). So, what’s the reason that newly created index is having ID 70?

After looking into sys.indexs catalog view, i am only able to find out the indexes on the table. But, When I looked into sys.stats catalog view, i found the answer. When we create a index, statistics with same index key columns also gets created. So, the index_id picked is the next value from sys.stats view.

Have a look at this example

use tempdb 
go 
if object_id('t1') is not null 
drop table t1 
go 
Create table t1 ( c1 int primary key 
, c2 nvarchar(10) 
, c3 nvarchar(10) ) 
go 
insert into t1 (c1,c2,c3) 
select top 100 number,type,cast(number as nvarchar(4)) + 'test'
from master.dbo.spt_values where type like 'p%' 
go 
Create index ix_t1_c2 on t1(c2) 
go 
select index_id,name 
from sys.indexes where object_id = object_id('t1') 

select stats_id,name from sys.stats 
where object_id = object_id('t1') 
go

Two rows will be returned from both queries. Now, we query the table with predicate on where column C3, so in database where we have auto create statistics turn on (default), SQL will create statistics for the column c3.

select c1,c2,c3 from t1 where c3 like '0%'
go
select index_id,name 
from sys.indexes where object_id = object_id('t1')
select stats_id,name 
from sys.stats where object_id = object_id('t1')
go

So now, sys.ststs will give 3 rows. So, if we create index now, the newly created index will have id of 4.

Create index ix_t1_c3 on t1(c3)
go
select index_id,name 
from sys.indexes where object_id = object_id('t1')

select stats_id,name 
from sys.stats where object_id = object_id('t1')
go

Now, we can query the either way as shown below

select c1,c2,c3 
from t1 with (index=4) where c3 like '0%'

select c1,c2,c3 
from t1 with (index=ix_t1_c3) where c3 like '0%'

Please note that explicit hints are not recommended but even if you need to use ever always use the index name in actual code.

More fun with External Tools in SSMS

Earlier I posted a way to open file folder from SSMS by creating external tool. But, it was only opening the file folder in windows explorer and then file needs to be located in that folder which needs some key strokes. So, I changed the command to open the containing folder with file selected. Below is the screen shot which shows the command and arguments.

image

So, now when we press Alt+T,O,Enter, windows explorer will be open explorer with file selected.

Also, a command can be added to send a mail through outlook with file attached. Below is the screen shot which shows the command and arguments.

image

Command:  C:\Program Files (x86)\Microsoft Office\Office14\outlook.exe (will be different on your box as per the office version and installed directory)

Arguments: /c ipm.note /m “&subject=Sending%20File%20$(ItemFileName)” /a $(ItemPath)

See if this is useful for you.

Grouping by Multiple Columns to Single Column as A String

This post is created in reference to question asked by Pinal Dave on SQLAuthority. So, If you have not gone through this post, I suggest to read that post first.

XML is good method to combine a string, but when we use the query as described in post and if some XML markup characters exists in the data, the values will be changed.  I am using the same sample data and query used in that post. Now if we insert row having > then it will be converted to &gt;. So, we can use TYPE to create it as XML and then use value method to get it as a string datatype.

INSERT INTO TestTable (ID, Col)
SELECT 1, 'A>'

SELECT t.ID
    , STUFF( (SELECT ',' + s.Col FROM TestTable s WHERE s.ID = t.ID  
                   FOR XML PATH(''),type,root).value('root[1]','nvarchar(max)'),1,1,''
            ) AS CSV
FROM TestTable AS t
GROUP BY t.ID
GO

But this is not a full proof method, when some control char( between ASCII 1 to 31) exists in data, the xml serialization can fail and query will be aborted with error as shown below.

INSERT INTO TestTable (ID, Col)
SELECT 1, CHAR(1)

Now, if we run the above query, we get this error message “OR XML could not serialize the data for node ‘NoName’ because it contains a character (0x0001) which is not allowed in XML. To retrieve this data using FOR XML, convert it to binary, varbinary or image data type and use the BINARY BASE64 directiv” . But the query posted by Pinal Dave works successfully. Though it has some data which we do not want in final result. If you have legacy system, these chars might be there in data. Also, there are chances that some app is misbehaving and inserting this chars into data.

The alternative is to replace the control chars using some function. I created this function as suggested by Adam Machanic(blog) in this post‘s comment to use something similer to what he has done in sp_whoIsActive.

 if object_id('dbo.fn_removeControlCharacters') is not null
    drop function dbo.fn_removeControlCharacters
go
Create function dbo.fn_removeControlCharacters (@strValue nvarchar(1000))
returns nvarchar(1000)
as
begin
declare @rtStrValue nvarchar(1000)

SELECT @rtStrValue = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
                        (REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
                                (REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
                                        (REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
                                                (REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( 
                                                                                                                @strValue
                                                 ,NCHAR(1),N'?'),NCHAR(2),N'?'),NCHAR(3),N'?'),NCHAR(4),N'?'),NCHAR(5),N'?'),NCHAR(6),N'?')
                                         ,NCHAR(7),N'?'),NCHAR(8),N'?'),NCHAR(11),N'?'),NCHAR(12),N'?'),NCHAR(14),N'?'),NCHAR(15),N'?')
                                 ,NCHAR(16),N'?'),NCHAR(17),N'?'),NCHAR(18),N'?'),NCHAR(19),N'?'),NCHAR(20),N'?'),NCHAR(21),N'?')
                         ,NCHAR(22),N'?'),NCHAR(23),N'?'),NCHAR(24),N'?'),NCHAR(25),N'?'),NCHAR(26),N'?'),NCHAR(27),N'?')
                     ,NCHAR(28),N'?'),NCHAR(29),N'?'),NCHAR(30),N'?'),NCHAR(31),N'?');

return @rtStrValue 
end
go

Now, if we run below query, you can get the desired output.

SELECT t.ID
    , STUFF( (SELECT ',' + dbo.fn_removeControlCharacters(s.Col) FROM TestTable s 
    WHERE s.ID = t.ID  FOR XML PATH(''),type,root).value('root[1]','nvarchar(max)'),1,1,''
            ) AS CSV
FROM TestTable AS t
GROUP BY t.ID
GO

But, this is a workaround and the best method is to use CLR function for grouping string columns.