Felix Schwarz Diplom-Informatiker
Software-Entwicklung und Beratung

Migrating data from MS Access to MySQL easily

MySQL AB: If we are too slow, your bug must be invalid Most people will know the MySQL database. MySQL AB (recently bought by Sun) is the company behind that open source database and pays most developers working on the code.

Some years ago MySQL AB released a really nice tool which they named MySQL Migration Toolkit. Its a Java GUI application which can be used to copy data from other databases (like Microsoft Access) to a MySQL database. The tool even generates a lua script file so that you can automate the migration process afterwards.

In a recent project I was hired to build a replacement for a custom Access application. Part of the work was to create a new database schema. Of course the old data must be migrated so that it isn't lost. The chosen production database was PostgreSQL due to its solid track record when it comes to SQL features like transactions, referential integrity and all this boring stuff.

I figured out that it would be much easier for me converting data in the old database to an open source database and do the structure conversion in a second step with custom tools.

This was when the Migration Toolkit came into play: It copied the data from Access to MySQL and I wrote some scripts to get that data into Postgres. It worked really great after a very short time and everyone was happy. Sure there were some initial problems (like Migration Toolkit won't work with Java 1.6 but only 1.5) but these were solved quickly.

Some weeks later I discovered that some data in my new database was different from the Access data: In Access the value was 0.075, MySQL stored 0.08. This happened for all columns which were specified as "Single" with automatic precision in MS Access. The MySQL bugtracker revealed that others had the same problem: Access singles are converted to MySQL 7,2 double.

Unfortunately this bug was marked as 'invalid' when I found it two years after its creation (because the MySQL developer investigating it did not read the bug description correctly so he was not able to reproduce it). Bugs are far from unusual and so I was quite confident that I could jump in and help fixing it. Unfortunately, the bug was closed and therefore I was sure that no MySQL developer would ever look at it again. But their bug tracker won't let me open the bug as a normal user.

I pinged another MySQL dev directly (Susanne Ebrecht) and she reopened the bug :-) I followed her advise and tried to reproduce the problem which newer versions of MS Access. Clearly, the bug was still there...

Unfortunately, no MySQL developer had enough time to look at the report again and so an automatic process closed the bug again:

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Maybe I'm just too stupid but I was not able to re-open it. Maybe only the original reporter can. But the main problem is the crappy configuration of the MySQL bug tracker which closes bugs which are marked as 'need more info' after 30 days which is quite short. Of course you need some sort of 'timeout' in a bug tracker because else you get swamped with useless bug reports no one is able to reproduce so they stay open forever.

Other bug trackers implement that better: If a bug has the NEEDINFO status in Bugzilla there is a checkbox "I am providing the requested information for this bug." and afterwards the status is being set back to normal.

The problem with MySQL was that I provided all necessary information and the bug was closed only because MySQL AB worked too slowly!

This is an example how to not handle bug reports! If you don't have enough time to triage all bugs in a short period of time, then don't set the timeout to 30 days!

Which left me alone with the rounding problem... Luckily I was a able to use some kind of "dirty hack" to work around the whole issue: The problem is that the created MySQL database schemes are too imprecise. The rounding problem occurs not until the real data is copied which happens at a later stage.

So I extended the lua migration script to call a Python script of mine. This script just sets all columns with the type 'double(7,2) to 'DECIMAL(7,4)'. And this fixed the rounding errors for me. Of course it would be nicer if a proper fix was included in the Migration Toolkit itself but this seems to be out of reach...

Maybe my workaround can save you some trouble or time so I put my dirty hack online. Help yourself!

My lua adaptation:

(...)
  -- create database objects online
  print("Create database objects.")
  
  grtM.callFunction(
    grtV.toLua(targetConn.modules.TransformationModule), "executeSqlStatements", 
    {targetConn, "global::/migration/targetCatalog", "global::/migration/creationLog"}
  )
  grt.exitOnError("The SQL create script cannot be executed.")
end
  
-- added this line to execute my Python script
os.execute("C:\\Python24\\python.exe \"c:\\migration\\fix_access_single_migration.py\"")
  
-- ------------------------------------------------------------------
-- checkpoint 5
-- Bulk data transfer
(...)

And this is my Python script (fix_access_single_migration.py):

#!/usr/bin/env python
# License: MIT licee
# Copyright (c) 2008 Felix Schwarz <felix schwarz AT oss schwarz eu>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
# 
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
# 
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
 
import MySQLdb
 
hostname = 'localhost'
user = 'root'
password = ''
db = ''

def get_all_tables(cursor):
    tables = []
    cursor.execute('show tables')
    rows = cursor.fetchall()
    for row in rows:
        tables.append(row[0])
    return tables

def find_double_fields(cursor, tables):
    problematic_fields = []
    for table in tables:
        cursor.execute(u'show fields from `%s`' % table)
        rows = cursor.fetchall()
        for row in rows:
            col_name = row[0]
            col_type = row[1]
            if col_type == u"double(7,2)":
                problematic_fields.append({"table": table, "col_name": col_name})
    return problematic_fields

def fix_fields(cursor, problems):
    for problem in problems:
        table = problem["table"]
        col_name = problem["col_name"]
        sql = u'ALTER TABLE `%s` CHANGE COLUMN `%s` `%s` DECIMAL(7,4)'
        cursor.execute(sql_template % (table, col_name, col_name))

conn = MySQLdb.connect(host=hostname, user=user,
                       passwd=password, db=db,
                       use_unicode=True, charset="cp1250")
conn.begin()

cursor = conn.cursor()
tables = get_all_tables(cursor)
problems = find_double_fields(cursor, tables)
fix_fields(cursor, problems)

conn.commit()
conn.close()