sql concepts and interview questions: 2011

Saturday, 26 November 2011

using Case in a update Statement

Use CASE in the UPDATE statement



27>

28> CREATE TABLE project   (project_no   CHAR(4) NOT NULL,

29>                         project_name CHAR(15) NOT NULL,

30>                         budget FLOAT NULL)

31>

32> insert into project values ('p1', 'Search Engine',        120000.00)

33> insert into project values ('p2', 'Programming',          95000.00)

34> insert into project values ('p3', 'SQL',                  186500.00)

35>

36> select * from project

37> GO



(1 rows affected)



(1 rows affected)



(1 rows affected)

project_no project_name    budget

---------- --------------- ------------------------

p1         Search Engine                     120000

p2         Programming                        95000

p3         SQL                               186500



(3 rows affected)

1>

2> -- Use CASE in the UPDATE statement.

3>

4> UPDATE project SET budget = CASE

5>           WHEN budget > 0 and budget < 100000 THEN budget* 1.2

6>           WHEN budget > = 100000 and budget < 200000 THEN budget* 1.1

7>           ELSE budget* 1.05

8>           END

9> GO



(3 rows affected)

1> select * from project

2> GO

project_no project_name    budget

---------- --------------- ------------------------

p1         Search Engine                     132000

p2         Programming                       114000

p3         SQL                   205150.00000000003



(3 rows affected)

1> drop table project

2> GO

1>

2>

Sunday, 16 October 2011

select Query Syntax

[ WITH <common_table_expression>]
SELECT select_list [ INTO new_table ]
[ FROM table_source ] [ WHERE search_condition ]
[ GROUP BY group_by_expression ]
[ HAVING search_condition ]
[ ORDER BY order_expression [ ASC | DESC ] ]
The UNION, EXCEPT and INTERSECT operators can be used between queries to combine or compare their results into one result set.

Using the INSTEAD OF Trigger in SQL Server 2005

I, like probably a lot of people, am going to have to do a bit of DB refactoring in order to be able to take advantage of Linq to SQL and/or the ADO.NET Entity Framework. While the framework is certainly flexible, there's plenty of databases out there that were designed without an O/R layer in mind, and there's definitely spots where you can pigeonhole yourself if you didn't take O/R into account during the initial design.
So, one great way to simplify your model is to create views that abstract the complicated underlying model. Views work great for just about any read-only operation, however, when you get into the inserting and deleting is when you hit some heavy limitations. From the SQL Server 2005 BOL:

You can modify the data of an underlying base table through a view, as long as the following conditions are true:

Any modifications, including UPDATE, INSERT, and DELETE statements, must reference columns from only one base table.

The columns being modified in the view must directly reference the underlying data in the table columns. The columns cannot be derived in any other way, such as through the following:
An aggregate function: AVG, COUNT, SUM, MIN, MAX, GROUPING, STDEV, STDEVP, VAR, and VARP.

A computation. The column cannot be computed from an expression that uses other columns. Columns that are formed by using the set operators UNION, UNION ALL, CROSSJOIN, EXCEPT, and INTERSECT amount to a computation and are also not updatable.

The columns being modified are not affected by GROUP BY, HAVING, or DISTINCT clauses.

TOP is not used anywhere in the select_statement of the view together with the WITH CHECK OPTION clause.

The first one is obviously the biggest issue (at least for me). It's often likely that you've got more than one table that aggregates into a single entity.
So, one option you have then is the INSTEAD OF trigger. Apparently this feature has been around since SQL 2000, but in my ignorance, haven't heard of it until recently. The INSTEAD OF triggers can be placed on any view or table to replace the standard action of the INSERT statement.
As with all triggers, this is one of those "Use with great caution" features, as it definitely convolutes the normal execution path. That said, it's pretty powerful, and gets you around most of the updatable view limitations.
Now, I'm going to show a simple example. I'm perfectly aware that this example easy to do in the ADO.NET Entity Framework. This example doesn't represent my own database issues, and why I am investigating going this route. Just trying to keep it simple for this example. If you'd like to hear about my database issues, go ahead and comment, and I'd love to discuss and see if there's better options out there.
So, here's the schema:

Now, in my application, I want to represent a single entity called Employee, that is an aggregation of the Person attributes and the Employee attributes. So, what I can do then is create a view.

CREATE VIEW [dbo].[Employees] 

AS 

SELECT     dbo.Employee.EmployeeID, dbo.Employee.Salary, dbo.Person.FirstName, dbo.Person.LastName, dbo.Person.PersonID 

FROM         dbo.Employee INNER JOIN 

                      dbo.Person ON dbo.Employee.PersonID = dbo.Person.PersonID

Now, this view is going to work great, except when I want to update the data in the tables. Now, I can write some insert code using the new Linq To SQL functionality:

MyDatabaseClassesDataContext ctx = new MyDatabaseClassesDataContext(); 

Employee emp = new Employee(); 

emp.FirstName = "Jim"; 

emp.LastName = "Fiorato"; 

emp.Salary = 400000; 

ctx.Employees.InsertOnSubmit(emp); 

ctx.SubmitChanges();

However, that will throw an exception like the following:

System.Data.SqlClient.SqlException: View or function 'dbo.Employees' is not updatable because the modification affects multiple base tables.

So, now what I can do is create the INSTEAD OF trigger on the view. I'll just do an example of the insert trigger. That trigger will look like this (note this doesn't support multiple items in the inserted rowset at all):

CREATE TRIGGER [dbo].[TR_INSERT_Employees] 

   ON [dbo].[Employees] 

   INSTEAD OF INSERT 

AS 

BEGIN 

    -- SET NOCOUNT ON added to prevent extra result sets from 

    -- interfering with SELECT statements. 

    SET NOCOUNT ON; 

    DECLARE @PersonID AS UNIQUEIDENTIFIER 

    DECLARE @EmployeeID AS UNIQUEIDENTIFIER 

    SET @PersonID = NEWID(); 

    SET @EmployeeID = NEWID(); 

    -- Person Table 

    INSERT INTO Person (PersonID, FirstName, LastName) 

    SELECT @PersonID, FirstName, LastName FROM inserted; 

    -- Employee Table 

    INSERT INTO Employee (EmployeeID, PersonID, Salary) 

    SELECT @EmployeeID, @PersonID, Salary FROM inserted; 

END

This all makes me wonder if we're just pushing the impedance mismatch work-around further into the database from the code. Is that a good or bad thing? It's likely bad, but just how bad? I can't tell yet.
I think people usually feel better about what's in their database, vs. what's in their code, because relational databases are something you can wrap you head around, where boundaries are clear, and people make quick and mostly accurate assumptions about what is going on. Whereas, in you C# code, you'll often find it hard to grasp it all, because it could be doing so many things, in so many different ways.

clustered and Non clustered Index

1. Introduction

We all know that data entered in the tables are persisted in the physical drive in the form of database files. Think about a table, say Customer (For any leading bank India), that has around 16 million records. When we try to retrieve records for two or three customers based on their customer id, all 16 million records are taken and comparison is made to get a match on the supplied customer ids. Think about how much time that will take if it is a web application and there are 25 to 30 customers that want to access their data through internet. Does the database server do 16 million x 30 searches? The answer is no because all modern databases use the concept of index.

2. What is an Index

Index is a database object, which can be created on one or more columns (16 Max column combination). When creating the index will read the column(s) and forms a relevant data structure to minimize the number of data comparisons. The index will improve the performance of data retrieval and adds some overhead on data modification such as create, delete and modify. So it depends on how much data retrieval can be performed on table versus how much of DML (Insert, Delete and Update) operations.
In this article, we will see creating the Index. The below two sections are taken from my previous article as it is required here. If your database has changes for the next two sections, you can directly go to section 5.

3. First Create Two Tables

To explain these constraints, we need two tables. First, let us create these tables. Run the below scripts to create the tables. Copy paste the code on the new Query Editor window, then execute it.

Collapse | Copy Code

CREATE TABLE Student(StudId smallint, StudName varchar(50), Class tinyint);
CREATE TABLE TotalMarks(StudentId smallint, TotalMarks smallint);
Go

Note that there are no constraints at present on these tables. We will add the constraints one by one.

4. Primary Key Constraint

A table column with this constraint is called as the key column for the table. This constraint helps the table to make sure that the value is not repeated and also no null entries. We will mark the StudId column of the Student table as primary key. Follow these steps:

Right click the student table and click on the modify button.
From the displayed layout, select the StudId row by clicking the Small Square like button on the left side of the row.
Click on the Set Primary Key toolbar button to set the StudId column as primary key column.

Now this column does not allow null values and duplicate values. You can try inserting values to violate these conditions and see what happens. A table can have only one Primary key. Multiple columns can participate on the primary key column. Then, the uniqueness is considered among all the participant columns by combining their values.

5. Clustered Index

The primary key created for the StudId column will create a clustered index for the Studid column. A table can have only one clustered index on it.
When creating the clustered index, SQL server 2005 reads the Studid column and forms a Binary tree on it. This binary tree information is then stored separately in the disc. Expand the table Student and then expand the Indexes. You will see the following index created for you when the primary key is created:

With the use of the binary tree, now the search for the student based on the studid decreases the number of comparisons to a large amount. Let us assume that you had entered the following data in the table student:

The index will form the below specified binary tree. Note that for a given parent, there are only one or two Childs. The left side will always have a lesser value and the right side will always have a greater value when compared to parent. The tree can be constructed in the reverse way also. That is, left side higher and right side lower.

Now let us assume that we had written a query like below:

Collapse | Copy Code

Select * from student where studid = 103;
Select * from student where studid = 107;

Execution without index will return value for the first query after third comparison.
Execution without index will return value for the second query at eights comparison.
Execution of first query with index will return value at first comparison.
Execution of second query with index will return the value at the third comparison. Look below:

Compare 107 vs 103 : Move to right node
Compare 107 vs 106 : Move to right node
Compare 107 vs 107 : Matched, return the record

If numbers of records are less, you cannot see a different one. Now apply this technique with a Yahoo email user accounts stored in a table called say YahooLogin. Let us assume there are 33 million users around the world that have Yahoo email id and that is stored in the YahooLogin. When a user logs in by giving the user name and password, the comparison required is 1 to 25, with the binary tree that is clustered index.
Look at the above picture and guess yourself how fast you will reach into the level 25. Without Clustered index, the comparison required is 1 to 33 millions.
Got the usage of Clustered index? Let us move to Non-Clustered index.

6. Non Clustered Index

A non-clustered index is useful for columns that have some repeated values. Say for example, AccountType column of a bank database may have 10 million rows. But, the distinct values of account type may be 10-15. A clustered index is automatically created when we create the primary key for the table. We need to take care of the creation of the non-clustered index.
Follow the steps below to create a Non-clustered index on our table Student based on the column class.

After expanding the Student table, right click on the Indexes. And click on the New Index.
From the displayed dialog, type the index name as shown below and then click on the Add button to select the column(s) that participate in the index. Make sure the Index type is Non-Clustered.
In the select column dialog, place a check mark for the column class. This tells that we need a non-clustered index for the column Student.Class. You can also combine more than one column to create the Index. Once the column is selected, click on the OK button. You will return the dialog shown above with the selected column marked in blue. Our index has only one column. If you selected more than one column, using the MoveUp and MoveDown button, you can change order of the indexed columns. When you are using the combination of columns, always use the highly repeated column first and more unique columns down in the list. For example, let use assume the correct order for creating the Non-clustered index is: Class, DateOfBirth, PlaceOfBirth.
Click on the Index folder on the right side and you will see the non-clustered index based on the column class is created for you.

7. How Does a Non-Clustered Index Work?

A table can have more than one Non-Clustered index. But, it should have only one clustered index that works based on the Binary tree concept. Non-Clustered column always depends on the Clustered column on the database.
This can be easily explained with the concept of a book and its index page at the end. Let us assume that you are going to a bookshop and found a big 1500 pages of C# book that says all about C#. When you glanced at the book, it has all beautiful color pages and shiny papers. But, that is not only the eligibility for a good book right? One you are impressed, you want to see your favorite topic of Regular Expressions and how it is explained in the book. What will you do? I just peeped at you from behind and recorded what you did as below:

You went to the Index page (it has total 25 pages). It is already sorted and hence you easily picked up Regular Expression that comes on page Number 17.
Next, you noted down the number displayed next to it which is 407, 816, 1200-1220.
Your first target is Page 407. You opened a page in the middle, the page is greater than 500.
Then you moved to a somewhat lower page. But it still reads 310.
Then you moved to a higher page. You are very lucky you exactly got page 407. [Yes man you got it. Otherwise I need to write more. OK?]
That’s all, you started exploring what is written about Regular expression on that page, keeping in mind that you need to find page 816 also.

In the above scenario, the Index page is Non-Clustered index and the page numbers are clustered index arranged in a binary tree. See how you came to the page 407 very quickly. Your mind actually traversed the binary tree way left and right to reach the page 407 quickly.
Here, the class column with distinct values 1,2,3..12 will store the clustered index columns value along with it. Say for example; Let us take only class value of 1. The Index goes like this:

Collapse | Copy Code

1: 100, 104, 105

So here, you can easily get all the records that have value for class = 1. Map this with the Book index example now

Friday, 7 October 2011

Tips to avoid sql injection

Encrypt sensitive data.
Access the database using an account with the least privileges necessary.
Install the database using an account with the least privileges necessary.
Ensure that data is valid.
Do a code review to check for the possibility of second-order attacks.
Use parameterised queries.
Use stored procedures.
Re-validate data in stored procedures.
Ensure that error messages give nothing away about the internal architecture of the application or the database.

Monday, 3 October 2011

TRANSACTION Isolation Levels in SQL Server

SQL Server 2005 has some unique features to deal with the Transaction system in the database world. It has some unique sets to take care of every possibility of transactions or types of transaction. Technically, it will give us discrete ways to isolate the transactions from occurrence of deadlocks or crashes.
Before going deeper to the Isolation level that SQL Server provides to distinguish types of transaction, let’s have a look on the definition of the TRANSACTION. What does transaction means in real world and in a database scenario

Transaction: When you give something to me, and I take it; then it’s a transaction. When you withdraw money from an ATM machine, and you receive the money; then it is also a kind of transaction. Here, what I am trying to reflect is a simple question that, Is the transaction above is valid or consistent. What if I deny accepting that you haven’t given me anything, may be you have given to someone else instead of me? What if after withdrawing of money from your account, your account balance still shows the same amount as before. (Oh! For this case you have to be lucky enough

). And what will happen if you and your partner are withdrawing all your money from the joint account at the same time from different ATMs.

So there must be some methodology to keep track of all these things and to manage them perfectly even in such natural disaster conditions, the database and the information regarding any transaction must be in a consistent form.

To achieve the above thought in the database system, we have Locking mechanism. It acts like this, suppose there is a room and that is electronically locked and only the person who knows the password can enter, provided the room is empty or he has to wait till the room is evacuated by the other person. But here we have a little controversy, like the person who is waiting outside may have some different task than the person who is already inside. And it may be possible that both of the tasks may not interfere to each other or may interfere slightly that may be manageable. So at the end of the discussion, we may conclude that the security system must provide different types of security code or passwords to the corresponding person. Let’s have a deeper look on this.

Suppose you are doing a transaction for withdrawing money from the ATM machine and at the same time the bank manager is doing a routine checkup of your transaction which is totally a different operation and suppose at the same time the bank teller is checking your account for the remaining balance. All these operations are different but accessing the same entity or resource and that is your account information that is kept inside the database. Out of these operations only you are doing write operation in the database as you are withdrawing money and the remaining balance has to be updated in the database. So a proper security mechanism must be implemented here to ensure non-conflict or smooth going of these operations. Here the security can be ensured by putting locks (and of course the type of locks) for each type of operations, which means you are isolating the resources from other transactions that may hamper its consistency. Here comes the role of Isolation levels.

The Isolation levels are categorized depending on the type of locks it uses for a particular level. At lower level of isolation more users can access the same resource without any confliction, but they may face concurrency related issues such as dirty-reads and data inaccuracy (described below). At higher Isolation level, these types of issues can be eliminated but here only a limited no. of users can access the resource.

Let’s have a look on Locks and type of Locks. Locks can be treated as a policy which will prevent you or a process to perform any action (that may conflict with other actions) on an object or resource if that object or resource is already occupied by any other process or user. It’s something like you are going to propose someone who is already with someone else. But situation matters (may be you are lucky enough for this). Like it depends on what you are going to do and on what the other person is doing. So for such type of situations, we have types of locks.

Types of Locks:

Shared Locks(S): This lock is useful when you are doing some read operations and no manipulations like write operations (update/delete/insert). This is compatible with other shared locks, update locks and Intent shared locks. It can prevent users from performing dirty reads (described below).
Exclusive Locks(X): These locks are big possessive types. They are not compatible with any other locks. Like these locks will not work if any other locks are already there with the resource neither it will let other locks to be created on the resource until it finishes its job. This lock used for data-modification operations, such as INSERT, UPDATE or DELETE.
Update Locks (U): This can be treated as a mixture and perfect collaboration of the above two locks (Shared and Exclusive). Let’s take an example. You are going to perform an update operation on a table at row number 23. So here you are doing two types of operation, one is searching the record 23 which can be achieved by implementing shared lock and the other is updating the record after it has found which will be achieved by Exclusive lock. So, here the shared lock transforms to exclusive lock when it finds the target or else it will be remain as shared lock only. This prevents deadlocks to a great extent. This lock is compatible with Intent shared and shared locks.
Intent locks (also called as Demand Locks): These are used to establish a lock hierarchy. Here it will protect placing a shared (S) lock or exclusive (X) lock on a resource lower in the lock hierarchy. For example, suppose you are performing a read operation on a piece of data with shared lock. At the same time another user wants to modify data with exclusive lock, but the shared lock is compatible with other shared locks as a result any number of shared locks can be obtained on a piece of data and hence the user with exclusive has to wait indefinitely till the completion of all shared lock operations. So to avoid this type of starving situation, Intent locks are very useful. Here if the second user comes with Intent Exclusive lock, then no other transaction can grab a shared lock. Here it can claim the use of exclusive lock after the first transaction completes.

There are basically three types of Intent Locks that are most popular:

a) Intent Shared Lock(IS)
b) Intent exclusive (IX)
c) Shared with intent exclusive (SIX)

To get more information on Intent Locks, refer the link below:
http://msdn.microsoft.com/en-us/library/aa213039(SQL.80).aspx

Schema Locks: These locks protect the schema of the database. This deals with the DDL (Data Definition Language) commands like adding or dropping column information for a table, rename table, drop table, blocking any DDL operation during the execution of the query. There are two types of Schema Locks:

a) Schema modification (Sch-M): This lock is applied only when the SQL Server engine is modifying the structure of the schema like adding or dropping the columns of a table. During this period if any other transaction tries to access that object then that will be denied or delayed.

b) Schema stability (Sch-S): This indicates a query using this table being compiled. Here it will not block any transactional locks like shared locks or exclusive locks to perform any operation on the data. But if the query is in running condition, it will prevent execution of any DDL commands on that table.

Bulk Update Locks: This lock is useful while performing BULK operation on the TABLE like BULK INSERT. It will prevent any other types of normal T-SQL operations to be executed on the table except BULK processing of the data.

Now let us explore some buzzwords in Isolation Level:

Lost updates: It generally occurs when more than one transaction tries to update any specific record at a time i.e. when one update is successfully written to the database, but accidently a second update from different transaction overwrites the previous update information. This is called Lost Updates.

Non-repeatable reads (also called Inconsistent analysis): Dealing with inconsistent data i.e. suppose you read one value from a table and started working on it but meanwhile some other process modifies the value in the source resulting a false output in your transaction, then it is called Non-repeatable reads. Let’s have a more practical example, suppose before withdrawing money from your account, you always perform a balance check and you find 90$ as a balance in your account. Then you perform withdraw operation and try to withdraw 60$ from your account but meanwhile the bank manager debits 50$ from your account as a penalty of minimum balance (100$), as a result you have only 40$ in your account now. So your transaction either fails as the demanded amount (60$) is not there in your account or it may show (-20$) (which is quite impossible as of banking constraints

). More simply we can say Non-repeatable reads take place if a transaction is able to read the same row several times and gets a different value for each time.

Repeatable Reads: This specifies that transactions cannot read data that has been modified by other transactions but not yet committed and if the current transaction is reading some data then no other transactions can modify that data until the current transaction completes.

Phantom reads: Don’t be afraid, we are not talking about ghosts or phantom in opera. Here Phantom means unexpected or unrealistic. It occurs basically when two identical queries are executed, and the set of rows returned by the second query is different from the first. Let’s have a simple example; suppose your banking policy got changed and according to that the minimum balance should be 150$ instead of 100$ for each account type, anyways this is not a big deal for a data base administrator. He will perform an update statement for each account type where the minimum balance is less than 150$ and updates the value to 150$. But unfortunately when the manager checks the database, he got one record with minimum balance less than 150$ in the same table. The DBA got surprised, how come this is possible as he performed the update statement on the whole table.

This is called Phantom read. The occurrence of Phantom reads are very rare as it needs proper circumstances and timing for such type of events as in the above example, someone may have inserted one new record with the minimum balance less than 150$ at the very same time when the DBA executed the UPDATE statement. And as it is a new record, it didn’t interfere with the UPDATE transaction and executed successfully. This type of Phantom reads can be avoided using higher level of isolation i.e. SERIALIZABLE (described below).

Dirty reads: This is one of the types of Non-repeatable Reads. This happens when a process tries to read a piece of data while some other process is performing some update operations on that piece of data and is not completed yet.

Now coming to the root point of the article i.e. the Isolation levels; we have basically five types of Isolation level in SQL Server 2005. Each one is described below:

Here we consider a simple example for all the below cases. The data shown in the table is taken by assumption and is only used for example purpose; the data given may or may not be right as per real scenario. The table information is given below:

Database Name: OLAP

Table Name: dbo.car_info

Table Column Information:

Column_name	Type
Car_Sl_No	int
CarCompany	varchar
CarBodyType	varchar
CarName	varchar
EngineType	varchar

Table Data:

Car_Sl_No	CarCompany	CarBodyType	CarName	EngineType
1	Maruti	small	Maruti-800	petrol
2	Honda	sedan	City	petrol
3	Maruti	small	Maruti-800	petrol
4	Maruti	small	Waganor Duo	petrol
5	Honda	sedan	City	petrol
6	TATA	small	indica	diesel
7	Mahindra	SUV	Scorpio	diesel
8	TATA	SUV	Sumo	diesel
9	Maruti	sedan	SX4	petrol
10	Maruti	sedan	Swift-Dzire	diesel
11	TATA	small	Nano	petrol

Assumption: Here in all our examples, two different transactions can be considered as done by two different users. For testing, you can achieve this by two separate Query windows or two separate instances for SQL Server Management Studio (SSMS). But you have to be careful enough to run the queries for both the connections simultaneously or immediately.

1. READ UNCOMMITTED Isolation Level: This is very useful in case you need higher concurrency in the transactions. Here one transaction can access the data that has been modified by the second transaction even if the second transaction is not committed.

Syntax:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

Example: Suppose the User1 is trying to update the EngineType from ‘petrol’ to ‘diesel’ for Car_Sl_No with value 2. And at the same time User2 is trying to read the data for the Car_Sl_No with value 2. Under normal condition or default setting, User2 cannot read the data from that row. But if the User2 sets the transaction isolation level to ‘Read Uncommitted’, then it is possible to read that row with updated information even if the transaction is not committed by User1.

For User1:

[ Copy to Clipboard ] | [ View Source ]

USE OLAP
Go
BEGIN TRAN
UPDATE [OLAP].[dbo].[car_info]
   SET [EngineType] = 'diesel'
 WHERE Car_Sl_No = 2

Here, note that the transaction is still running, as there is no commit statement in the above code. Under default condition, the query ran by User2 will keep executing till the User1 commits the transaction.

For User2:

[ Copy to Clipboard ] | [ View Source ]

USE OLAP
Go
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
--Above statment is used to read the updated value even if the transation is not committed.
SELECT [Car_Sl_No]
      ,[CarCompany]
      ,[CarBodyType]
      ,[CarName]
      ,[EngineType]
  FROM [OLAP].[dbo].[car_info]
where Car_Sl_No = 2

As in the above code, we set the transaction isolation level to ‘Read Uncommitted’; User2 can access that record with updated data.

Output:

Although it increases the concurrency of the transactions but did you notice the disadvantage behind this. What if User1 ROLLBACK his transaction or if somehow the management studio of User1 crashed or hanged (As the transaction is not committed yet, it will rollback itself, resulting false or inconsistent value to User2).

Limitations:

Dirty-reads
Lost Updates
Phantom reads
Non-repeatable reads

Advantages:

Higher Concurrency

In SSIS (SQL Server Integration Service): To achieve the above norm in SSIS, select the task or container on which you want to set the isolation level. Then go to Properties, and set the property named ‘IsolationLevel’ to “ReadUncommitted”.

The benefit here is that more than one task can access the same table simultaneously in case of parallel execution of the package.

2. READ COMMITTED Isolation Level: This is the default level set in SQL Server 2005 and the immediate higher level of ‘READ UNCOMMITTED Isolation Level’. It prevents transactions to read data if some other transaction is doing some update operation on the data as a result eliminates Dirty Reads. It prevents reading of uncommitted data. But is affected with other demerits like ‘Lost Updates’.

Syntax:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED

Example: Considering our previous example, let the EngineType for Car_Sl_No with value 2 is NULL and User1 is trying to update the EngineType to ‘petrol’, but at the same time User2 started a new transaction checked the value as Null and starts updating the record to ‘diesel’ before the transaction is committed by User1. As a result User1 lost its updated value, it is overwritten by User2.

For User1:

[ Copy to Clipboard ] | [ View Source ]

USE OLAP
Go
BEGIN TRAN
 
DECLARE @EngineType varchar(20)
SELECT @EngineType = [EngineType] FROM [OLAP].[dbo].[car_info] where Car_Sl_No = 2
--The below waitfor statement is used for other opearations that User1 is doing for this transaction.
WAITFOR DELAY '00:00:10' --For acheiving real time Concurrency in this example
IF @EngineType IS NULL
BEGIN
UPDATE [OLAP].[dbo].[car_info]
   SET [EngineType] = 'petrol'
 WHERE Car_Sl_No = 2
END
ELSE
BEGIN
 Print 'Record is already updated'
END
 
COMMIT TRAN

For User2:

[ Copy to Clipboard ] | [ View Source ]

USE OLAP
Go
BEGIN TRAN
 
DECLARE @EngineType varchar(20)
SELECT @EngineType = [EngineType] FROM [OLAP].[dbo].[car_info] where Car_Sl_No = 2
--Here waitfor statement is same for User2 also
WAITFOR DELAY '00:00:10' --For acheiving real time Concurrency in this example
IF @EngineType IS NULL
BEGIN
UPDATE [OLAP].[dbo].[car_info]
   SET [EngineType] = 'diesel'
 WHERE Car_Sl_No = 2
END
ELSE
BEGIN
 Print 'Record is already updated'
END
 
COMMIT TRAN

Here both the users successfully updated the value, but the value updated by User2 persists and User1 lost its updated value.

Output: The final output for the record is

Limitations:

Lower Concurrency than ReadUncommitted
Lost Updates

Advantage:

Eliminates Dirty Reads

In SSIS (SQL Server Integration Service): Select the task or container on which you want to set the isolation level. Then go to Properties, and set the property named ‘IsolationLevel’ to “ReadCommitted”.

3. REPEATABLE READ Isolation Level: It is the next higher level than the previous isolation level and the main point here is it does not release the shared lock once the transaction starts for reading data. In simple terms, a transaction cannot read data if it has been modified by other transaction but not yet committed. Also no other transactions can modify data if that data has been read by the current transaction until the current transaction completes. Here in this isolation level, the concurrency rate is very low. As a result, eliminates ‘Lost updates’, non-repeatable reads, etc. But still has a big problem and that is called ‘Phantom read’. Let’s have an example to elaborate this.

Syntax:

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ

Example: Suppose the manager of a showroom declares to transfer all the cars manufactured by Honda Company to another showroom and to maintain a proper record for this operation. We need to add one more column called ‘TransferredSatus’ to indicate whether that car is transferred or not. Here, the DBA will check for the presence of any Honda Company cars in the record that are not yet transferred by checking the value of the column ‘TransferredSatus’. If he found some, then corresponding transfer operations will be performed and the record will be updated to ‘1’ (i.e. transferred). Here by using ‘Repeatable Read’ isolation level, we can eliminate ‘Lost Update’, ‘dirty reads’ and ‘non-repeatable reads’. But what if at the time of updating the database, someone else from the inventory system inserts one record about the new Honda Company car that just arrived to the showroom. Let’s see the effect.

For User1:

[ Copy to Clipboard ] | [ View Source ]

USE OLAP
Go
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRAN
 
 --check the existance Honda company cars
 Declare @Car_Sl_No int
 Declare TransferingCarsCursor CURSOR FOR 
 Select Car_Sl_No from dbo.car_info where CarCompany = 'Honda' and TransferredSatus = 0
 
 OPEN TransferingCarsCursor
 
 FETCH NEXT FROM TransferingCarsCursor 
 INTO @Car_Sl_No
 WHILE @@FETCH_STATUS = 0
 BEGIN
  ----------------------------------
  ------Car transfering operations--
  ----------------------------------
 FETCH NEXT FROM TransferingCarsCursor 
  INTO @Car_Sl_No
 END 
 CLOSE TransferingCarsCursor
 DEALLOCATE TransferingCarsCursor
 
 WAITFOR DELAY '00:00:10' --For acheiving real time Concurrency in this example
 -- This is the time when the other user inserts new record about new Honda car.
 
 Update dbo.car_info
  set TransferredSatus = 1 where CarCompany = 'Honda' and TransferredSatus = 0
 
COMMIT TRAN

Here it found only 2 records from Honda Company.

For User2:

[ Copy to Clipboard ] | [ View Source ]

USE OLAP
Go
BEGIN TRAN
INSERT INTO [OLAP].[dbo].[car_info]
           ([CarCompany]
           ,[CarBodyType]
           ,[CarName]
           ,[EngineType]
           ,[TransferredSatus])
     VALUES
           ('Honda','sedan','Civic GX','petrol',0)
 
COMMIT TRAN

But in between the execution of the transaction by User1, User2 inserts one new record about the new Honda Car. Assume the record is inserted before the Update statement of User1, as a result instead of updating only 2 records; User1 updates the new record as well along with the earlier records, showing wrong information in the chart. This is called ‘Phantom Read’. Even ‘Repeatable Read’ isolation mode can’t resolve this problem. For this, you need to implement higher isolation level i.e. SERIALIZABLE.

Output for User1:

(3 row(s) affected)

Limitations:

Lower Concurrency
Phantom Reads

Advantage:

Eliminates Dirty Reads
Eliminates Lost Updates
Eliminates Non-Repeatable Reads

4. SERIALIZABLE Isolation Level: It is highest level in Isolation levels as a result the concurrency rate is low. But it eliminates all issues related to concurrency like dirty read, non repeatable reads, lost updates and even phantom reads. According to this Isolation Level:

Statements cannot read data if other transactions are performing update operations on the data and is not committed yet.
Also no other transactions can perform any update operations until the current transaction completes its read operations.
And the important point here is that it is performing a Range Lock based on the filters used to get the data from the table i.e. it locks not only the current records but also the new records that are falling under the current filter condition. In simple language, no other transactions can insert new rows that are falling under the current filter condition until the transaction completes.

Considering our previous example, we will set the isolation level to Serializable.

Syntax:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

For User1:

[ Copy to Clipboard ] | [ View Source ]

USE OLAP
Go
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
 
 --check the existance Honda company cars
 Declare @Car_Sl_No int
 Declare TransferingCarsCursor CURSOR FOR 
 Select Car_Sl_No from dbo.car_info where CarCompany = 'Honda' and TransferredSatus = 0
 
 OPEN TransferingCarsCursor
 
 FETCH NEXT FROM TransferingCarsCursor 
 INTO @Car_Sl_No
 WHILE @@FETCH_STATUS = 0
 BEGIN
  ----------------------------------
  ------Car transfering operations--
  ----------------------------------
 FETCH NEXT FROM TransferingCarsCursor 
  INTO @Car_Sl_No
 END 
 CLOSE TransferingCarsCursor
 DEALLOCATE TransferingCarsCursor
 
 WAITFOR DELAY '00:00:10' --For acheiving real time Concurrency in this example
 -- This is the time when the other user inserts new record about new Honda car.
 
 Update dbo.car_info
  set TransferredSatus = 1 where CarCompany = 'Honda' and TransferredSatus = 0
 
COMMIT TRAN

For User2:

[ Copy to Clipboard ] | [ View Source ]

USE OLAP
Go
BEGIN TRAN
INSERT INTO [OLAP].[dbo].[car_info]
           ([CarCompany]
           ,[CarBodyType]
           ,[CarName]
           ,[EngineType]
           ,[TransferredSatus])
     VALUES
           ('Honda','sedan','Civic GX','petrol',0)
 
COMMIT TRAN

Output for User1:

(2 row(s) affected)

Here User2 transaction will wait till the User1 transaction completed avoiding ‘Phantom reads’.

Limitations:

Lower Concurrency

Advantage:

Eliminates Dirty Reads
Eliminates Lost Updates
Eliminates Non-Repeatable Reads
Eliminates Phantom Reads

5. SNAPSHOT Isolation Level: It specifies that the data accessed by any transaction is consistent and valid for that particular transaction and the data will be same throughout the whole transaction. It implements Row Versioning to isolate data for each transaction i.e. it will keep separate version of each modified row in the transaction in the tempdb database totally dedicated to that transaction. Any update of data in the original row will not affect the current transaction.

The ALLOW_SNAPSHOT_ISOLATION database option must be set to ON before you can start a transaction that uses the SNAPSHOT isolation level. It is by default kept as OFF because of performance issues.

To enable SNAPSHOT isolation level, use the below alter database command.

ALTER DATABASE OLAP SET ALLOW_SNAPSHOT_ISOLATION ON

We will consider a small example to illustrate the above condition.

Syntax:

SET TRANSACTION ISOLATION LEVEL SNAPSHOT

Example: We will try to insert a new record in the [car_info] table by User1 and at the same time we will try to fetch the records by User2.

For User1:

[ Copy to Clipboard ] | [ View Source ]

USE OLAP
Go
BEGIN TRAN
INSERT INTO [OLAP].[dbo].[car_info]
           ([CarCompany]
           ,[CarBodyType]
           ,[CarName]
           ,[EngineType]
           ,[TransferredSatus])
     VALUES
           ('Honda','sedan','Civic Hybrid','petrol',0)

Note: The above transaction is not committed yet.

For User2:

[ Copy to Clipboard ] | [ View Source ]

USE OLAP
Go
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
BEGIN TRAN
Select * from dbo.car_info where CarCompany = 'Honda' 
COMMIT TRAN

Output for User1:

(1 row(s) affected)

Output for User2:

One record is successfully inserted by User1, but a consisted version of the previous data is kept in Version store (in tempdb) before the starting of the transaction. So User2 is accessing the data from the version store and is unable to show the newly inserted record.

Now commit the transaction for User1 by “COMMIT TRAN” command, and again run the transaction for User2, the output will be as below:

You can check the version store for the current transaction along with other information regarding the current transaction by running the below DMVs (Dynamic Management Views) before committing User1 transaction.

select * from sys.dm_tran_active_snapshot_database_transactions

Output:

Limitations:

Low performance due to versioning in tempdb

Advantage:

Eliminates Dirty Reads
Eliminates Lost Updates
Eliminates Non-Repeatable Reads
Allows multiple updates by versioning

Other Isolation Levels in SSIS:

Chaos Isolation Level: Behaves the same way as ReadUncommitted, with additional features as stated below:

It permits viewing uncommitted changes by other transactions.
It checks any other uncompleted update transactions with higher restrictive isolation levels to ensure not to raise any conflicts i.e. any pending changes from more highly isolated transactions cannot be overwritten.
Rollback is not supported in this Isolation level.

If you want to perform read operations over the data once per transaction, then go for the Chaos isolation level.

Unspecified Isolation Level: When the Isolation level of any transaction cannot be determined, then it comes under ‘Unspecified Isolation Level’ i.e. a different isolation level than the ones above are used. For example performing custom transaction operation like ODBCtransaction, if the transaction level does not set by the user then it will execute according to the isolation level associated by the ODBC driver.

Optimistic Vs Pessimistic:

Optimistic concurrency: Here SQL Server assumes that the occurrence of resource conflicts between different transactions are very rare but not impossible. So it allows transactions to execute without locking any resources. Only in case of any modifications in the data, it will check for any conflicts, if it finds any then it will perform the locking operations accordingly. In simple terms, we are assuming that every transaction will carry on without any problem except some exceptional cases.

Pessimistic Concurrency: Here it will lock resources irrespective of the type of transaction to ensure successful completion of transaction without deadlocks. Here, we are assuming that the conflicts are likely and some major steps have to be taken to avoid those conflicts.

Let’s have an example on this:

Suppose in a car showroom, a customer wants to go for a test drive, but before the manager say something, it has to be clear that the car is empty and is ready for driving. What if another customer is already requested for the test drive for the same car? If the manager allows both of them to drive the car simultaneously, considering mutual understanding between the customers then we call it as an Optimistic concurrency. But if the manager wants to be sure about non-conflicts of the customer, then he allows the customers for test driving one-by-one. This is what we call as Pessimistic Concurrency.