In Part 1 of 2 I’ll show how to iterate over the databases in any given SQL Server Instance using SSIS. In Part 2 I’ll show how to extend the SSIS Package created in Part 1 to iterate over all databases across any number of Instances in your environment.
The verb iterate means “to perform repeatedly.” Within the context of this article it will specifically mean “running the same code for each database in a SQL Server Instance.”
I’ll be developing a sample SSIS 2012 Package using Visual Studio 2010 but these concepts are generic and apply to all currently available versions of SSIS so you can produce the sample Package using any version of SSIS by following the steps in the demo below.
Let’s get started.
SSIS Packages support three types of grouping Containers: Foreach Loop Container, For Loop Container and Sequence Container. The one in which we’re most interested for this task is the Foreach Loop Container which from this point forward I will mostly refer to by the abbreviated form FeLC. The FeLC does exactly what it sounds like it does, it loops over a collection of items and does something for each item, i.e. it iterates over a collection of items. This is exactly the type of flow we want for our process, but what types of collections can the FeLC iterate over? More on that later.
We can easily retrieve the names of all databases from an Instance by opening SQL Server Management Studio (SSMS) and issuing a SELECT statement against the sys.databases System View but how does knowing how to retrieve a list of databases from the Instance in SSMS help us work with a FeLC in SSIS if the FeLC cannot execute a SQL query on its own behalf ? More on that later too, and then it will all come together.
First, it is worth mentioning some background about how SSIS was constructed. SSIS is mostly built atop .NET which has very robust inroads to accessing data in SQL Server and working with that data in a disconnected way, i.e. after the data has been retrieved from the database and is still in memory in .NET. In short, the FeLC can natively iterate over a .NET data structure known as a DataSet. Additionally we can configure SSIS to capture the results from any SQL statement, the same statement we might write and execute in SSMS, into a SSIS Variable containing a .NET DataSet object. You can think of a DataSet as a memory-resident data structure containing one to many DataTables, where each DataTable represents a SQL Server resultset. SSIS can store a DataSet in a Variable so we can later work with it in other areas of our SSIS Package. Perfect! Just what we want, to iterate over a SQL Server resultset, namely our list of databases, and do something useful for each row.
Bringing it all together, we know how to craft a SQL statement to select the list of databases that reside on an Instance, we said we can capture the resultsets returned from a SQL batch as a .NET DataSet and store it in a SSIS Variable, and in theory we know an FeLC can iterate over a DataSet that is stored in a Variable, but to launch these three concepts into action we still need a way to execute the SQL statement and store the resultset it returns into a Variable. The Execute SQL Task will help us here, of course! The Execute SQL Task also happens to do exactly what its name implies. The Execute SQL Task can execute a SQL statement (or batch or statements) against a database connection and store the resultsets returned by the statements in a Variable. SQL statements executed by an Execute SQL Task do not have to return a resultset, i.e. it can also execute an UPDATE statement for example, but in our case we want a resultset that we can store in a Variable so we can later iterate over it using a Foreach Loop Container.
So, to recap, here is what we’re thinking:
- Use an Execute SQL Task to issue a SELECT statement that retrieves information about the databases on an Instance and store the resultset in a SSIS Variable of type Object.
- Use a Foreach Loop Container to iterate over the DataSet stored in a SSIS Variable and do something for each row in the resultset, i.e. do something for each database.
- The options for what you can do inside the Foreach Loop Container are up to your imagination but for purposes of this demo we’ll simply be writing the database names to the output window to show one way you might access each row in the resultset separately.
At this stage we have broken down the problem quite a bit so let’s begin diving into the development of the sample SSIS Package. If you want to follow along developing your own version of the sample Package open your preferred copy of Visual Studio or BIDS and create a new SSIS Package, else you can just follow along referring to the screenshots provided in each step. I’ll be using Visual Studio 2010 and SSIS 2012 but this sample is generic such that you can produce the sample Package using any version of Visual Studio or BIDS that supports SSIS development.
1. We will need a Variable to temporarily store our list of databases, i.e. our SQL Server resultset in the form of a .NET DataSet. Create a Variable named Databases and configure it as an Object data type.
I’ll do some hand-waving here, but only because SSIS does some hand-waving when it comes to storing a DataSet in a Variable. Notice that we created the Variable with a type of Object and not of type DataSet, primarily because DataSet is not in the list. Everything in .NET is ultimately an object so Object, in terms of an SSIS Variable data type, means any complex object that is not in the drop-down list of scalar data types. SSIS will “know” that a DataSet is being stored in the Variable and the FeLC will have no trouble later addressing it as one.
2. Create a new Connection Manager by right-clicking in the Connection Manager window and selecting New OLE DB Connection… Configure the Connection Manager to connect to your chosen database instance and make sure you click the Test Connection button to confirm it is a valid configuration.
3. Drag a new Execute SQL Task from the SSIS Toolbox onto the Control Flow. You’ll notice Visual Studio has kindly marked the Task with a red-X denoting a validation error exists in the Task. You can safely ignore the validation error for now. The validation error will be resolved once we configure the Task to connect to our Instance in the next step.
4. Double-click the Execute SQL Task to open the Editor and configure the General Page as follows:
SELECT database_id, name FROM sys.databases;
5. The setup done in the previous steps was tedious but we needed to work through it to get to the exciting part, namely capturing the SQL Server resultset into a SSIS Variable. Do you remember the Variable we created in Step 1? Configure the Result Set page as follows:
A quick note about the Result Name: because we are using an OLE DB driver to connect to SQL Server named resultsets are not supported meaning we must assign the Result Names ordinally beginning with 0. If you’re interested, this is the error you would receive if you named the result something other than 0 and executed the Package:
Error: 0xC00291E3 at Execute SQL Task - Get List of Databases, Execute SQL Task: The result binding name must be set to zero for full result set and XML results.
Click OK and notice how Visual Studio has now lifted the Red-X from our Task letting us know it appears to be configured in a valid way:
Baseline #1: At this point you can execute the Package. While it will not do much more than select the list of databases and store those in a Variable, if everything is configured correctly it should succeed giving you a baseline comfort level before we head into configuring the Foreach Loop Container.
6. We will need two Variables to store the database id and name of the databases as we iterate over our DataSet. Create a new Variable at the Package scope* named DatabaseId and configure the data type as Int32. Create a second new Variable also at the Package scope* named DatabaseName and configure the data type as String. Your Variables window should now look like this:
* SSIS Variables, as with variables in most other programming languages, have scope. Visual Studio will create new Variables in the scope that has the focus when the variable is created. We want our Variables created in the Package scope. To give the Package focus click any open space on the Control Flow Design Surface, i.e. we do want the Execute SQL Task to have focus when the Variable is created.
8. Double-click the Foreach Loop Container to open the Editor and configure the Collection Page as follows:
Notice we chose was the Foreach ADO Enumerator. This naming makes sense since we’ll be iterating over an ADO.NET DataTable contained within a DataSet and Databases Variable is of type Object. The Execute SQL Task placed the data it retrieved from the database into the Variable of type Object. When the FeLC is using the Foreach ADO Enumerator it treats the variable of type Object it is set to iterate over as an ADO.NET DataSet, which suits our needs quite well.
9. The last piece for us to complete the FeLC configuration is to setup the Variable Mappings Page. The resultset that is stored in the Databases Variable has two columns. Here is the SQL Statement we issued in the Execute SQL Task:
SELECT database_id, name FROM sys.databases;
If we want to use the values of these columns as we iterate over the DataSet, and why wouldn’t we want to access them both else we should not have included them in the SELECT statement, then we must map them into scalar Variables of their own.
The identification of the column-to-variable mapping is managed ordinally. In other words the first column in the resultset, database_id, has Index 0. The second column, name, has an index of 1. Knowing that we can easily map each column to the the appropriate variable.
Baseline #2: At this point you can execute the Package and while it won’t do much more than select the list of databases, store it in a Variable and then iterate over the rows in the first DataTable in the DataSet, if everything is configured correctly it should succeed giving you a second baseline.
This is where your imagination can takeover. What is is that you want to do for each database? Do you need additional columns from sys.databases?
10. Only for purposes of illustrating that the SSIS Package is in fact iterating over the rows returned by the SQL statement in our Execute SQL Task I will add a Script Task to write each row’s database name to the Output Window.
11. Double-click the Script Task to open the Editor. Set the Script Language to Visual Basic* and map User::DatabaseName into the Script as a ReadOnlyVariable:
* For the record I overwhelmingly prefer C# over Visual Basic. The fact is that SSIS 2005 Script Tasks only supported Visual Basic so the choice to use Visual Basic here is purely to allow people still using SSIS 2005 to continue to follow along without having to port C# code.
12. Click the Edit Script button and use this code to replace as your Main method:
Public Sub Main()
Dim tempBool As Boolean = False
Dts.Events.FireInformation(1, "DatabaseName (Id)", Dts.Variables("User::DatabaseName").Value.ToString() + " (" + Dts.Variables("User::DatabaseId").Value.ToString() + ")", "", 0, tempBool)
Dts.TaskResult = ScriptResults.Success
Save and close the script, then Click OK on the Script Task Editor to save the changes.
Development is complete and you are ready to execute the Package.
You should see the name and id of each database in your Instance shown in the Output Window, like this:
SSIS package “C:\SkyDrive\SSIS.IterateOverDatabases-Part1\SSIS.IterateOverDatabases-Part1\Package.dtsx” starting.
Information: 0x1 at Script Task, DatabaseName: master (1)
Information: 0x1 at Script Task, DatabaseName: tempdb (2)
Information: 0x1 at Script Task, DatabaseName: model (3)
Information: 0x1 at Script Task, DatabaseName: msdb (4)
SSIS package “C:\SkyDrive\SSIS.IterateOverDatabases-Part1\SSIS.IterateOverDatabases-Part1\Package.dtsx” finished: Success.