SSIS and Stored Procedures with Temp Tables in SQL 2012

In earlier versions of SQL Server, if we want to use stored procedures in OLEDB source command which contains temporary tables we take some workarounds mentioned below
1. http://www.sqlservercentral.com/articles/Integration+Services+%28SSIS%29/65112/
2. http://www.sqlservercentral.com/articles/Integration+Services+%28SSIS%29/61824/

In SQL 2012, we can include the result set definition to execute statement by using with result set syntax. By doing this we do not need to take workarounds described in above links. Below is the demo stored procedure

use AdventureWorks2012
go
if object_id('dbo.proc_testTempTableInSSIS','p') is not null
drop procedure dbo.proc_testTempTableInSSIS
go
Create procedure dbo.proc_testTempTableInSSIS
as
begin
create table #SSISTest ( c1 int, c2 nvarchar(100))

insert into #SSISTest (c1,c2)
select top 10 schema_id, type_desc from sys.all_objects

select c1
,c2
from #SSISTest
end
go

Now, we can use WITH RESULT SETS in OLEDB source as below

exec dbo.proc_testTempTableInSSIS
with result sets
(
(
c1 int,
c2  nvarchar(100)
)
);

image

Hope this helps you somewhere. Thanks for reading.

Distinct using Group By

Adding Distinct keyword hurt the query performance badly, but in real world we need to add Distinct may be because of not properly done normalization or may be we are working on a system which is intentionally designed that way. But, this post is not about that.

We use Distinct to get the unique rows , but we can achieve same goal by adding group by. Optimizer will create same plans for both in most of the cases . But, in case of some computation done on returned columns, the optimizer can choose a better plan for Group By compared to Distinct.

Lets have a look at example

--Create schema and dummy data
use tempdb
go
if object_id('dbo.addValues','fn') is not null
    drop function dbo.addValues
go
Create function dbo.addValues( @c1 smallint, @c2 smallint)
returns int
begin
    return @c1 + @c2
end
go
if object_id('dbo.dummyTable','u') is not null
    drop table  dbo.dummyTable
go
create table dbo.dummyTable
(
      c1 smallint
    , c2 smallint
)
go
insert into dbo.dummyTable(c1,c2)
select top (500000) sc.column_id
    ,sc.system_type_id
from master.sys.columns sc
cross apply master.sys.columns sc1
go
--Queries
use tempdb
go
select distinct c1,c2,cast(c1 as int)+c2 c3 from dbo.dummyTable
go
select distinct c1,c2,dbo.addValues(c1,c2) c3 from dbo.dummyTable
go
select c1,c2,cast(c1 as int)+c2 c3
from dbo.dummyTable group by c1,c2
go
select c1,c2,dbo.addValues(c1,c2) c3
from dbo.dummyTable group by c1,c2
go

If we run above queries, returned data will be same of each one. But, if we look at the execution plan, in case of distinct the sort is performed after the compute scalar, but in case of group by the sort is performed before compute scalar operator. This is gain in performance as in case of group by scalar function is executed less number of times.  For query with Distinct the scalar function gets executed each time for row, while in group by query scalar function gets executed only for distinct rows.  You can use profiler sp_count template or DMOs to confirm this. As a side note, this is the reason for why we should avoid scalar function if possible.

Distinct By Group By

Apart from the less number of executions, this will also impact the  memory grant of queries as the in case of query using group by, the aggregate operator  will need less memory. This can really helps in production environment where queries need to wait long for memory grant and also spill to tempdb can be avoided in some situations.

Distinct By Group By Memory Grant

Finally, I would like to thank Rob Farly’s this for this post, which I have refereed. 

Hope this helps you somewhere. Also, apart from performance gain, it can be a good coding practice to use Group BY in place of Distinct as we have one more change to look into column list to see we are returning any column which is not required in result set and if somebody using select * (which we should not use ever in production code), in group by column name needs to be specified.

Persisted Computed column behavior when arithabort is off

If we have persisted computed columns and the arithabort is set to ON, SQL needs to re-compute the column despite having column marked as computed. The rather bigger issue is the query always scans index, in spite of having proper index.

Let’s have a look at the example

--Create test data
use tempdb
go
if object_id('dbo.testPersistedComputedColumn') is not null
drop table dbo.testPersistedComputedColumn
go
Create table dbo.testPersistedComputedColumn
(
c1 int not null identity(1,1) primary key
, c2 char(4000)
, c3 as substring(c2,1,2) persisted
)
go

insert into dbo.testPersistedComputedColumn(c2)
select top 200 name from sys.all_columns
go

if object_id('dbo.testNonPersistedComputedColumn') is not null
drop table dbo.testNonPersistedComputedColumn
go
create table dbo.testNonPersistedComputedColumn
(
c1 int not null identity(1,1) primary key
, c2 char(4000)
, c3 as substring(c2,1,2)
)
go

insert into dbo.testNonPersistedComputedColumn(c2)
select top 200 name from sys.all_columns
go

Now, If you run the below query, first will perform scan while the second will seek the clustered index.

use tempdb
go
/*
Also need to set ansi_warning to off, as on database having compatibility
greater than 80, setting ansi_warnings ON automatically sets arithabort ON
and SSMS default setting have both these ON.
*/
set ansi_warnings off
go
set arithabort off
go
select tpc.c2 , tpc.c3
from dbo.testPersistedComputedColumn  tpc
where tpc.c1 = 1
go
select tpc.c2 , tpc.c3
from dbo.testNonPersistedComputedColumn   tpc
where tpc.c1 = 1
go

Below is the snap of execution plan for both queries. In case of persisted column clustered index scan is performed and filter is used to filtering rows, while second perform the index seek. Also, even in case of persisted column, value of C3 is calculated at run time.

Computed Column and Arithabort
This will not be issue for most of environments as apart from one which are using DB library,  all other client libraries have ansi_warnings to ON as a default and most of databases now days have database which have compatibilities grater than 80.

Remove the server name from MRU list from SSMS "connect to server" dialog

In SSMS 2005 and SSMS 2008 the method to clear the server name from “connect to server” dialog is to close SSMS and remove mru.dat ( in case of 2005) or SqlStudio.bin ( in case of 2008 and 2008 R2).  This method has been documented here.

Below are the default location of these files
C:\Documents and Settings\<USER>\Application Data\Microsoft\Microsoft SQL Server\90\Tools\Shell\mru.dat
C:\Documents and Settings\<USER>\Application Data\Microsoft\Microsoft SQL Server\100\Tools\Shell\SqlStudio.bin

The biggest problem of deleting this file is we also loose the server names which we do not want to remove.  In SSMS 2012 there is documented way to delete the server name from “connect to server” dialog. Now, we can remove the server name by selecting it and pressing DELETE. 

Now in SSMS “connect to server”  dialog, navigate to the particular server with keyboard

image

Press “DELETE”. Now this server will be removed from list.

image

 

Please note that when you again connect to the server you removed, it will again added to MRU list and you again need to delete it.

References:-

1. http://sqlblog.com/blogs/aaron_bertrand/archive/2009/03/18/managing-the-mru-list-for-the-connect-to-server-dialog-in-ssms.aspx

2. http://blogs.msdn.com/b/managingsql/archive/2011/07/13/deleting-old-server-names-from-quot-connect-to-server-quot-dialog-in-ssms.aspx

How Index_id are generated…

When we use index query hint, we can give the index number or index name. I use index hint for testing performance purpose only and almost never use it actual code. I tend to use index numbers in test code. But, till some time ago I use to wonder why the index id of newly created index is quite bigger, for one wide table the index_id of newly created index is 70 and We do not have these many indexes:). So, what’s the reason that newly created index is having ID 70?

After looking into sys.indexs catalog view, i am only able to find out the indexes on the table. But, When I looked into sys.stats catalog view, i found the answer. When we create a index, statistics with same index key columns also gets created. So, the index_id picked is the next value from sys.stats view.

Have a look at this example

use tempdb 
go 
if object_id('t1') is not null 
drop table t1 
go 
Create table t1 ( c1 int primary key 
, c2 nvarchar(10) 
, c3 nvarchar(10) ) 
go 
insert into t1 (c1,c2,c3) 
select top 100 number,type,cast(number as nvarchar(4)) + 'test'
from master.dbo.spt_values where type like 'p%' 
go 
Create index ix_t1_c2 on t1(c2) 
go 
select index_id,name 
from sys.indexes where object_id = object_id('t1') 

select stats_id,name from sys.stats 
where object_id = object_id('t1') 
go

Two rows will be returned from both queries. Now, we query the table with predicate on where column C3, so in database where we have auto create statistics turn on (default), SQL will create statistics for the column c3.

select c1,c2,c3 from t1 where c3 like '0%'
go
select index_id,name 
from sys.indexes where object_id = object_id('t1')
select stats_id,name 
from sys.stats where object_id = object_id('t1')
go

So now, sys.ststs will give 3 rows. So, if we create index now, the newly created index will have id of 4.

Create index ix_t1_c3 on t1(c3)
go
select index_id,name 
from sys.indexes where object_id = object_id('t1')

select stats_id,name 
from sys.stats where object_id = object_id('t1')
go

Now, we can query the either way as shown below

select c1,c2,c3 
from t1 with (index=4) where c3 like '0%'

select c1,c2,c3 
from t1 with (index=ix_t1_c3) where c3 like '0%'

Please note that explicit hints are not recommended but even if you need to use ever always use the index name in actual code.

More fun with External Tools in SSMS

Earlier I posted a way to open file folder from SSMS by creating external tool. But, it was only opening the file folder in windows explorer and then file needs to be located in that folder which needs some key strokes. So, I changed the command to open the containing folder with file selected. Below is the screen shot which shows the command and arguments.

image

So, now when we press Alt+T,O,Enter, windows explorer will be open explorer with file selected.

Also, a command can be added to send a mail through outlook with file attached. Below is the screen shot which shows the command and arguments.

image

Command:  C:\Program Files (x86)\Microsoft Office\Office14\outlook.exe (will be different on your box as per the office version and installed directory)

Arguments: /c ipm.note /m “&subject=Sending%20File%20$(ItemFileName)” /a $(ItemPath)

See if this is useful for you.

Grouping by Multiple Columns to Single Column as A String

This post is created in reference to question asked by Pinal Dave on SQLAuthority. So, If you have not gone through this post, I suggest to read that post first.

XML is good method to combine a string, but when we use the query as described in post and if some XML markup characters exists in the data, the values will be changed.  I am using the same sample data and query used in that post. Now if we insert row having > then it will be converted to &gt;. So, we can use TYPE to create it as XML and then use value method to get it as a string datatype.

INSERT INTO TestTable (ID, Col)
SELECT 1, 'A>'

SELECT t.ID
    , STUFF( (SELECT ',' + s.Col FROM TestTable s WHERE s.ID = t.ID  
                   FOR XML PATH(''),type,root).value('root[1]','nvarchar(max)'),1,1,''
            ) AS CSV
FROM TestTable AS t
GROUP BY t.ID
GO

But this is not a full proof method, when some control char( between ASCII 1 to 31) exists in data, the xml serialization can fail and query will be aborted with error as shown below.

INSERT INTO TestTable (ID, Col)
SELECT 1, CHAR(1)

Now, if we run the above query, we get this error message “OR XML could not serialize the data for node ‘NoName’ because it contains a character (0x0001) which is not allowed in XML. To retrieve this data using FOR XML, convert it to binary, varbinary or image data type and use the BINARY BASE64 directiv” . But the query posted by Pinal Dave works successfully. Though it has some data which we do not want in final result. If you have legacy system, these chars might be there in data. Also, there are chances that some app is misbehaving and inserting this chars into data.

The alternative is to replace the control chars using some function. I created this function as suggested by Adam Machanic(blog) in this post‘s comment to use something similer to what he has done in sp_whoIsActive.

 if object_id('dbo.fn_removeControlCharacters') is not null
    drop function dbo.fn_removeControlCharacters
go
Create function dbo.fn_removeControlCharacters (@strValue nvarchar(1000))
returns nvarchar(1000)
as
begin
declare @rtStrValue nvarchar(1000)

SELECT @rtStrValue = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
                        (REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
                                (REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
                                        (REPLACE(REPLACE(REPLACE(REPLACE(REPLACE
                                                (REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( 
                                                                                                                @strValue
                                                 ,NCHAR(1),N'?'),NCHAR(2),N'?'),NCHAR(3),N'?'),NCHAR(4),N'?'),NCHAR(5),N'?'),NCHAR(6),N'?')
                                         ,NCHAR(7),N'?'),NCHAR(8),N'?'),NCHAR(11),N'?'),NCHAR(12),N'?'),NCHAR(14),N'?'),NCHAR(15),N'?')
                                 ,NCHAR(16),N'?'),NCHAR(17),N'?'),NCHAR(18),N'?'),NCHAR(19),N'?'),NCHAR(20),N'?'),NCHAR(21),N'?')
                         ,NCHAR(22),N'?'),NCHAR(23),N'?'),NCHAR(24),N'?'),NCHAR(25),N'?'),NCHAR(26),N'?'),NCHAR(27),N'?')
                     ,NCHAR(28),N'?'),NCHAR(29),N'?'),NCHAR(30),N'?'),NCHAR(31),N'?');

return @rtStrValue 
end
go

Now, if we run below query, you can get the desired output.

SELECT t.ID
    , STUFF( (SELECT ',' + dbo.fn_removeControlCharacters(s.Col) FROM TestTable s 
    WHERE s.ID = t.ID  FOR XML PATH(''),type,root).value('root[1]','nvarchar(max)'),1,1,''
            ) AS CSV
FROM TestTable AS t
GROUP BY t.ID
GO

But, this is a workaround and the best method is to use CLR function for grouping string columns.

sp_helptext v/s OBJECT_DEFINITION

Today, My friend Nakul has done a nice comparison of sp_helptext v/s OBJECT_DEFINITION in this post. As always Nakul’s this post is also very well documented and easy to understand.

Before reading this post I was not aware that sp_helptext can be used to get definition of check constraints, defaults and computed columns. But, below is the reason why I still prefer OBJECT_DEFINITION over sp_helpText.

Create one dummy stored procedure

use tempdb
 go
 Create procedure dbo.proc_dummyHighWidth
 as
 begin
 select * from dbo.dummy cross apply dbo.dummy cross apply dbo.dummy cross apply dbo.dummy cross apply dbo.dummy cross apply dbo.dummy cross apply dbo.dummy cross apply dbo.dummy cross apply dbo.dummy cross apply dbo.dummy cross apply dbo.dummy cross apply dbo.dummy cross apply dbo.dummy cross apply dbo.dummy cross apply dbo.dummy
 end
 go

Now, run the below code

exec sp_helpText 'dbo.proc_dummyHighWidth'
select OBJECT_DEFINITION(object_id('dbo.proc_dummyHighWidth'))

When you run the above code hp_helpText will wrap the code after 256 chars, you can clearly see why it’s warped after 256 char if you look at the definition of sp_helptext

use master;
exec sp_helptext 'sp_helptext'

In definition of sp_helptext there is a defined length of 255 chars, which is causing the text to warp. As per best practice we should not have text wider than 150 chars, but many of us are working with legacy code or a developer who has just begun their career. So, if there a wide text in module definition sp_helpText wrapping can cause some discomfort. Because of this reason I feel that it’s better to use OBJECT_DEFINITION then sp_helpText, even if we need to write some more chars.

How to ignore rows with Unicode data

Below is the sample data and sql query to ignore rows with unicode data

 
declare @t1 table
( c1 nvarchar(100)
)
insert into @t1
select N'Hello'
union
select N'你好'
union
select N'mix你好hello'
--How to ignore rows with Unicode data
select *
from @t1
where charindex('?',cast(replace(c1,'?','@?@') as varchar(100)))= 0

How to use fn_dblog to identify inserts from triggers

In one of our applications delta table is used to track changes. This delta table for one entity which also holds information for anything is added/deleted and updated for this entity and so this delta table gets updated by many related tables. For example, tables for tracking employee is dependent on Employee, Company, Agency, Salary, Contacts etc. and all the tables has triggers on the table for tracking employee information. The design of delta table is such that it only holds employeeID and date&time of the operation occurred. Probably delta table design/method used to maintain delta table can be much better, but this is legacy application, so we can’t do much with it until we have budget for it.

In real scenario there are much more tables then mentioned above and for completely different domain. Now, sometimes (twice or thrice a day) a bulk rows are getting added to delta table, which looked unnecessary. The complex thing is to find out the root cause of this inserts.

This is a legacy application and lots of operations are undocumented. At some places, this application use inserts/deletes in place of updates. So, its virtually impossible to find it out by tracing application.

Some of the Options to track this are:-

  • The one of the options is to add a Audit, but this DB is on 2005.
  • The other one is to use Change Data Capture or Change tracking but even this technologies are not available on 2005.
  • The other option is to add a column to delta table for capturing table name and then change triggers to add a table name. But this can lead to broke some application and also huge amount of testing of various depended functionalities needs to be done. 
  • Add some more triggers/Audit tables can be added to track changes, but this can be huge effort and some loss in terms of performance.
  • SQL trace. Trace can be used to find out what’s happened, but there is no fixed time of this behavior. So, trace can be very big and also some performance penalty with even server side trace.

So, I decided to use the undocumented fn_dblog function. Paul Randal (blog) has blogged on various scenarios where we can use fb_dblog to trace out the root cause.

Here the first step is to find out the timestamp from delta table when this occurred.

Next, we can use fn_dblog function and use transactionid and begin time column to identify the the transaction..

select top 100 cast([begin time] as datetime)
            , allocunitname
            , [transaction Id]
            , operation 
from fn_dblog(null,null)
where operation like 'LOP_BEGIN_XACT'
and convert(datetime,[begin time]) between '20120905 09:27:20.000'
                                and '20120905 09:27:20.980'

Next, we need to look into the transactions and use below query to look into the details

select distinct allocunitname 
from fn_dblog(NULL,NULL) 
where [transaction Id] like '0000:0fc616e1'

We need to look into all the transactions this way until we found relevant one. If you need you can use AllocUnitName column to further filtering the data. Once done it just a matter of looking into distinct AllocUnitName ( table and index). You can also do the group by on AllocUnitName .

select allocunitname, count(*) as Cnt 
from fn_dblog(NULL,NULL) 
where [transaction Id] like '0000:0fc616e1'
Group by allocunitname

Using this steps, I was able to find the culprit trigger. Hope this helps you somewhere.