Let me start by saying this did not turn out to be as easy as I originally thought. I started with a giant assumption that I think many may have in the past:
With SSIS, Microsoft’s integration tool shipped with their flagship data platform, I should be able to easily manage connecting to MySQL, which according to their website is “the world’s most popular open source database“, and pull data into SQL Server.
Judging by the number of posts on the internet of people asking for help on this topic I know I am not the only one to have my hopes dashed. To be fair, I did need to pull 14GB of data from MySQL into SQL Server, not the typical chore, but who cares, right? 14GB or 14KB, this should have been a simple issue of gaining connectivity and fixing myself a cup of coffee to sip while the servers did all the work…not so fast.
- The MySQL DBA I was partnering with used mysqldump to generate a 15GB ANSI text file containing the schema statements necessary to build the tables in the database as well as the INSERT statements necessary to load those tables. Unfortunately mysqldump generates SQL consistent with the MySQL SQL-dialect which happens to be incompatible with the SQL Server SQL-dialect. Strike 1.
- Hit up Bing for some leads on how to get the two platforms to work together using mysqldump as the linchpin while staring down the prospect of using PowerShell to programmatically fixing up a 15GB text file to make the SQL contents run against SQL Server.
- I asked the MySQL DBA to re-run mysqldump providing the option –comptible=mssql to generate SQL code that can be run against SQL Server. He did, but the SQL code generated did not parse against SQL Server. Strike 2.
- I finally asked for the MySQL DBA to run mysqldump to only generate the schema, and not the data. I then manually fixed up the file in a text editor to run against SQL Server. At least I was able to get something working even if it was only an empty database in SQL Server that looked like the database in MySQL.
- Fire up the 64-bit Import/Export Wizard from your start menu, not from within SSMS 2012 which will launch the 32-bit version.
- Nothing in the Data Source dropdown looks suitable to connect to MySQL.
- Hit up Bing.
- Download and install the ADO.NET Driver for MySQL (Connector/NET) to my server running Windows Server 2008 R2.
- Re-launch the Import/Export Wizard from SSMS 2012.
- Select the newly installed “.NET Framework Data Provider for MySQL” from the Data Source Dropdown
- Plug in the necessary Connection section information, Server and Port, as well as the necessary Security section info, User Id and Password.
- Click Next, all looks good.
- Choose my destination.
- My first clue something wasn’t going well was on the “Specify a Table Copy or Query, the Table options was disabled. Specify a query, not a problem, I knew the tables I needed to import.
- Select the “Write a query” option and Plug in a select * in the offered field.
- Error…Column information for the source and destination data could not be retrieved.
- Hit up Bing.
- Linked Servers…maybe. Note that I am not a huge of Linked Servers for several reasons, not the least of which is that I think employing them leads to rigid application designs that are hard to maintain and difficult to tune. That said, in a bind like this where I need to get data across a wire and a suitable ETL tool is not presenting itself I will use them although I do my best to limit them to one-off use cases and not productionalize code that leverages them.
- Linked Servers can only use an OLE DB Provider and OLE DB drivers are no longer provided by MySQL. MySQL, as well as most vendors including Microsoft, have moved away from OLE DB and placed their efforts into providing a full-featured ODBC driver. So what about the OLE DB Provider for ODBC (MSDASQL)? I read somewhere that this provider was to be deprecated, however upon repeated customer requests it was ported to 64-bit platforms and is in fact available in SQL 2012, lucky me.
- Download and install the ODBC Driver for MySQL (Connector/ODBC).
- Setup a 64-bit System DSN that uses the points to the MySQL ODBC Driver.
- Create a Linked Server that leverages the Microsoft OLE DB Provider for ODBC Drivers provider and points to the DSN I just created.
- Testing it out…promising. Using some simple 4-part naming I can select data from tables in the MySQL database and insert that data into tables in SQL Server.
- Problem 1 presented itself pretty early. Some of the columns in MySQL were setup as variable character strings that are longer than 8000 characters. OLE DB has trouble handling these columns, always trying to fall them back to 8000 character data types in the conversion from ODBC causing string truncation.
- Let’s get this thing on the highway and open it up for real…time to brush off some of my OPENQUERY skills.
- With some simple LEFT function calls in a pass-through OPENQUERY to the Linked Server I can decompose the VARCHAR(20000) column in MySQL into two VARCHAR(8000) columns and a VARCHAR(4000) column which the OLE DB Provider wrapping the ODBC Driver for SQL Server will happily consume. Then I can cast them all as VARCHAR(MAX) in my SELECT column-list and slam them back together into my VARCHAR(MAX) column in SQL Server. No problem.
- So with all the fun stuff converted to proper data types and decomposed as needed, I ran the first INSERT-SELECT-FROM-OPENQUERY that tried transferring the data but it wasn’t meant to be. With Linked Servers it’s a single-transaction, no batches, so bringing over several tables with multiple gigabytes of data in each caused the log to bloat to the point where it filled the server disk. I was not running this on beefy hardware with tons of disk space to spare. Strikeout.
- ODBC got me a little farther than the .NET Provider for MySQL so back to the Import/Export Wizard…but this time let’s try ODBC.
- Make sure you launch the 64-bit version of the Import/Export Wizard from your start menu.
- Same as before, but this time select the “.NET Framework Data Provider for Odbc” from the Data Source Dropdown.
- Enter your DSN name into the Dsn field and when you tab out of the field, the UI will populate the Connection String for you.
- Choose your destination database on the next page.
- On the Specify Table Copy of Query notice this time, unlike our first attempt with the Wizard, that the Copy data from one or more tables or views option is available. Promising…it tells me the driver is exposing the metadata SSIS asked for in a way it can consume it.
- From here you can see things are working better than before, however still not great. Choosing the Copy data from one or more tables or views and choosing a table to bring data from and table to load data into eventually results in a metadata error at runtime.
- Fine…select the Write a query to specify the data transfer option and Plug in a select * into the offered field for one of the tables you want to transfer.
- Regardless of the data volume and speed of hardware (did I mention the MySQL database I was pulling from was hosted on a laptop?) you should achieve some success.
This post was written for therapeutic reasons, to rid myself of the frustration these four attempts brought to my life. Hopefully this will help at least one other person trying to do something similar.
My cynical side says that while Microsoft and Oracle (keeper of MySQL at the moment) say they want their products to integrate with other platforms, the truth is that while their words may ultimately be true it is not their number one priority to implement and make easily accessible. If each vendor could keep you in their closed ecosystem and sell you lots of licenses for their products that is better for them. The more worldly view is not to read too deeply into any of these difficulties and feel happy when attempt n does what is required.
Thanks for reading,
The SQL Dev