Code Snippets, Examples and Technical News

Blog Posts


written 21 months ago (12/25/08 21:05:20)

Free Software is no silver bullet for niche markets

Many people think of solid application which causes not too much trouble when they talk about  Free Software. Most Free Software is given away for free, everyone can extend it as (s)he likes (presuming the necessary knowlegde) and there is no marketing department which urges for premature 'final' releases. Administrators often like that Free Software does not require any license management (if you just use it) and bug reports are openly shared because of the abundance of company politics which may try to maintain a zero-defect illusion even if the product is buggy like hell.

But the article's head line is 'Free Software is no silver bullet for niche markets' so you probably already expect that I will give you some counter-examples to the cliches above.

Pipe dreams meet real live

So today's example is  Lx-Office ( Wikipedia, sorry no English version available). Lx-Office is a web-based  ERP and  CRM system written in Perl and PHP and distributed under the GNU General Public License. It includes a module for financial accounting in companies. The ERP/accounting module of Lx-Office originated in  SQL-Ledger ( Wikipedia). After the fork from SQL-Ledger in 2003, Lx-Office was modified to meet German regulatory requirements.

In Germany, the majority of all tax advisors ( approximately 80% according to Financial Times Germany) uses software produced by a German cooperative named  DATEV ( Wikipedia). This software is really the heart of a tax advisor's services, different versions support almost everything from financial accounting and tax calculation to payroll accounting.

So you can imagine that for a serious accounting software, data exchange with tax advisors is really a must-have. Though DATEV does only offer properietary software (which only runs on MS Windows), they sell a  SDK which contains the complete specification of the KNE format ("Postversandformat"). With that format you can share accounting information with your tax advisor (e.g. the routine accounting is done in-house but the tax advisor should assemble the balance sheet at the end of the year).

The KNE format is an 'old-style' binary file format with a long track record (if you include the predecessor, the OBE format which was introduced in the late '80s). When I wrote libkne, a pure Python library to create and parse these binary KNE files, I tested my implementation against several real-world implementations. However, when I used the DATEV export using Lx-Office's demo server I was very surprised to see that the exported files where not valid according to the specification!

Export bug known for several years

It turned out that the problem is caused by different date string formatting. The export module  DATEV.pm transforms a string which is presumed to contain a date ('25.12.2008', which is 2008-12-25 in ISO notation) into the KNE binary date format ('251208' - yes, the KNE format is not y2k-safe). As soon you use a different date format (like '12/25/2008'), the export just puts in the date string '12/25/2008', ignoring the size restrictions (6 bytes) and the format specification ('DDMMYY') completely.

What really puzzled me was that this bug is known for 2 1/2 years and nothing happened so far. In June 2006 there was a  post in the user forum which explained the problem and even spotted the bug in the source code. Only thirty minutes later, another forum reader filed a  bug report with a reference to the post. Six months later, a developer requested more information (which looks stupid to me, all necessary information was given) from the submitter but no further action was taken until now (December 2008). That means that the DATEV export of Lx-Office is non-functional for 2 1/2 years if you don't use one specific date format!

Looking at the source

As always, we can have a look at the source to reveal the issue:

# Code taken from SL/DATEV.pm (r3482)
# https://lx-office.linet-services.de/svn/trunk/unstable/SL/DATEV.pm
# Licensed under the GPL v2+

sub datetofour {
  $main::lxdebug->enter_sub();

  my ($date, $six) = @_;

  ($day, $month, $year) = split(/\./, $date);

  if ($day =~ /^0/) {
    $day = substr($day, 1, 1);
  }
  if (length($month) < 2) {
    $month = "0" . $month;
  }
  if (length($year) > 2) {
    $year = substr($year, -2, 2);
  }

  if ($six) {
    $date = $day . $month . $year;
  } else {
    $date = $day . $month;
  }

  $main::lxdebug->leave_sub();

  return $date;
}

So the datetofour function gets a date string $date and a boolean flag $six and returns the KNE date string. The problem is that the function assumes a certain date format in line 10 in the snippet (line 481 in original file as of r3482): 'split(/\./, $date)' But this date formatting can be configured by the user so the function should query the configuration for the format to use.

Conclusions

I presented you one example of a Free Software application which has a serious known bug that was not fixed for more that 2 1/2 years (though it is probably only a matter of one or two lines). The project itself is  backed by two small companies and the software is used by quite a lot of people (Sourceforge counted 4468 downloads for Lx-Office ERP beta1 between 2008-08-12 and 2008-12-25). The project follows a traditional 'open source' development approach (public bug tracker and source control repository, community support through web forums) and uses a well-known license (GPL 2+). So this is clearly an example of a 'real' Free Software project.

Nevertheless the project failed to fix a simple bug so that users can not use functionality which is quite important in many enterprise use cases if you consider the extensive German regulatory policies. Obviously this functionality is not important enough for most users of Lx-Office. Probably (speculation!) many of them are in the 'low budget' segment which does not regularly exchange financial data with their tax advisors.

Admittedly financial accounting is really a niche use case (highly dependend on the country you're operating in, laws and policies change on a regular basis) so the user and developer number is quite small (compared to other software like Linux, Mozilla Firefox or OpenOffice). But it still gives evidence to my initial thesis that Free Software is not necessarily a solution for you. While Free Software in very common use cases can excel their proprietary competitors (e.g. Apache HTTP), you may find that you should evaluate Free Software in niche markets very carefully.

Brief remark: 'Batteries included' is the right approach for class libraries

While I described what code caused the bug on a technical level, the real (technical) problem for me is found below: In Perl 5 there is no built-in data type to represent dates. So the Lx-Office developers had to fall back on using strings to pass dates around. If Perl shipped some included module for date representation and manipulation, probably the bug would have never occured because things like date formatting would only influence the visual presentation.

So I think that on a class library level Python's approach of  batteries included is the right one. A decent(tm) programming language needs to ship modules for date manipulation (even a less-than-optimal API like in Python is better than a completely missing one). Just having  add-on modules for that is not sufficient because not all developers use them and it's hard to use multiple independently developed third-party modules with each other because they probably will use incompatible date representations...


written 22 months ago (11/29/08 13:18:13)

How to make input validation really complicated

In every web application you need to validate your input data very carefully. Data validation is a very common task and so it's surprising that there are several validation libraries in Python (like  formencode) which include validators for common tasks. However,  Trac does not integrate any of these libraries so every plugin developer has to write their own validation code.

Now look how complicated you can check if a string does only contain the characters a-z, hyphens ('-'), underscore ('_'):

# Only alphanumeric characters (and [-_]) allowed for custom fieldname
# Note: This is not pretty, but it works... Anyone have an easier way of checking ???
f_name, f_type = customfield[Key.NAME], customfield[Key.TYPE]
match = re.search("[a-z0-9-_]+", f_name)
length_matched_name = 0
if match != None:
    match_span = match.span()
    length_matched_name = match_span[1] - match_span[0]
namelen = len(f_name)
if (length_matched_name != namelen):
    raise TracError("Only alphanumeric characters allowed for custom field name (a-z or 0-9 or -_).")

Please note how deep the author digged into  Python's re library to find the span() method. So he first looks for an acceptable substring, computes the position of this substring, derives the length of the substring from that and checks if the length of the substring equals the length of the whole string.

At least the author had some doubts if his solution is the most elegant one (see the second line of the snippet above). So a simpler method of checking could be:

# Only alphanumeric characters (and [-_]) allowed for custom fieldname
f_name, f_type = customfield[Key.NAME], customfield[Key.TYPE]
match = re.search("([a-z0-9-_]+)", f_name)
if (match == None) or (match.group(1) != f_name):
    raise TracError("Only alphanumeric characters allowed for custom field name (a-z or 0-9 or -_).")

So now we got rid of all the index stuff. But still, you can do much easier than that by properly using regular expressions:

# Only alphanumeric characters (and [-_]) allowed for custom fieldname
f_name, f_type = customfield[Key.NAME], customfield[Key.TYPE]
if re.search('^[a-z0-9-_]+$', f_name) == None:
    raise TracError("Only alphanumeric characters allowed for custom field name (a-z or 0-9 or -_).")

Of course there is re.match but I try to avoid it due to personal issue with the method - I produced some bugs when using re.match previously.

You wonder which software does ship this mess?  This code is part of a plugin which helps you to manage custom fields for your Trac tickets ( CustomFieldAdminPlugin). Also the overall code quality of the module is quite poor so if you like to spend some time to learn refactoring, this is a good place to start.


written 22 months ago (11/16/08 15:46:18)

TurboMail 3.0 Alpha1 released

For quite a while no new versions of TurboMail were released and indeed TurboMail 2 just worked well for many users. But now there is an alpha release of the upcoming TurboMail 3.0 which is nearly a complete rewrite of TurboMail 2. With the new code structure we were able to add some nice features:

  • No strong coupling to  TurboGears anymore (use TurboMail in command line scripts if you like!)
  • Different delivery methods like SMTP, local mailbox etc.
  • Different delivery strategies (queuing and immediate delivery)

To get more people involved, we decided to release an alpha version. That means the most important functionality is already in, but some things are missing right now. Furthermore there are some small known bugs (sending pre-formatted messages won't work) but we can work on that later. Besides that, I'm using a this alpha (with some custom patches) on production servers since several months and it works really great so far.

Download

Documentation

I tried to built comprehensive documentation for every feature. This is still work in progress, but at least some docs are there:

If you have questions, patches or just want to rant about the alpha, please use the  mailing list. The  roadmap page shows the missing tasks. I'm currently writing a Python MTA to get real SMTP tests in again.


written 2 years ago (06/22/08 16:49:24)

Migrating data from MS Access to MySQL easily

MySQL AB: If we are too slow, your bug must be invalid

Most people will know the  MySQL database. MySQL AB ( recently bought by Sun) is the company behind that open source database and pays most developers working on the code.

Some years ago MySQL AB released a really nice tool which they named  MySQL Migration Toolkit. Its a Java GUI application which can be used to copy data from other databases (like Microsoft Access) to a MySQL database. The tool even generates a  lua script file so that you can automate the migration process afterwards.

In a recent project I was hired to build a replacement for a custom Access application. Part of the work was to create a new database schema. Of course the old data must be migrated so that it isn't lost. The chosen production database was  PostgreSQL due to its solid track record when it comes to SQL features like transactions, referential integrity and all this boring stuff.

I figured out that it would be much easier for me converting data in the old database to an open source database and do the structure conversion in a second step with custom tools.

This was when the Migration Toolkit came into play: It copied the data from Access to MySQL and I wrote some scripts to get that data into Postgres. It worked really great after a very short time and everyone was happy. Sure there were some initial problems (like Migration Toolkit won't work with Java 1.6 but only 1.5) but these were solved quickly.

Some weeks later I discovered that some data in my new database was different from the Access data: In Access the value was 0.075, MySQL stored 0.08. This happened for all columns which were specified as "Single" with automatic precision in MS Access. The MySQL bugtracker revealed that others had the same problem:  Access singles are converted to MySQL 7,2 double.

Unfortunately this bug was marked as 'invalid' when I found it two years after its creation (because the MySQL developer investigating it did not read the bug description correctly so he was not able to reproduce it). Bugs are far from unusual and so I was quite confident that I could jump in and help fixing it. Unfortunately, the bug was closed and therefore I was sure that no MySQL developer would ever look at it again. But their bug tracker won't let me open the bug as a normal user.

I pinged another MySQL dev directly (Susanne Ebrecht) and she reopened the bug :-) I followed her advise and tried to reproduce the problem which newer versions of MS Access. Clearly, the bug was still there...

Unfortunately, no MySQL developer had enough time to look at the report again and so an automatic process closed the bug again:

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Maybe I'm just too stupid but I was not able to re-open it. Maybe only the original reporter can. But the main problem is the crappy configuration of the MySQL bug tracker which closes bugs which are marked as 'need more info' after 30 days which is quite short. Of course you need some sort of 'timeout' in a bug tracker because else you get swamped with useless bug reports no one is able to reproduce so they stay open forever.

Other bug trackers implement that better: If a bug has the NEEDINFO status in  Bugzilla there is a checkbox "I am providing the requested information for this bug." and afterwards the status is being set back to normal.

The problem with MySQL was that I provided all necessary information and the bug was closed only because MySQL AB worked too slowly!

This is an example how to not handle bug reports! If you don't have enough time to triage all bugs in a short period of time, then don't set the timeout to 30 days!

Which left me alone with the rounding problem... Luckily I was a able to use some kind of "dirty hack" to work around the whole issue: The problem is that the created MySQL database schemes are too imprecise. The rounding problem occurs not until the real data is copied which happens at a later stage.

So I extended the lua migration script to call a Python script of mine. This script just sets all columns with the type 'double(7,2) to 'DECIMAL(7,4)'. And this fixed the rounding errors for me. Of course it would be nicer if a proper fix was included in the Migration Toolkit itself but this seems to be out of reach...

Maybe my workaround can save you some trouble or time so I put my dirty hack online. Help yourself!

My lua adaptation:

(...)
  -- create database objects online
  print("Create database objects.")
  
  grtM.callFunction(
    grtV.toLua(targetConn.modules.TransformationModule), "executeSqlStatements", 
    {targetConn, "global::/migration/targetCatalog", "global::/migration/creationLog"}
  )
  grt.exitOnError("The SQL create script cannot be executed.")
end
  
-- added this line to execute my Python script
os.execute("C:\\Python24\\python.exe \"c:\\migration\\fix_access_single_migration.py\"")
  
-- ------------------------------------------------------------------
-- checkpoint 5
-- Bulk data transfer
(...)

And this is my Python script (fix_access_single_migration.py):

#!/usr/bin/env python
# License: MIT licee
# Copyright (c) 2008 Felix Schwarz <felix schwarz AT oss schwarz eu>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
# 
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
# 
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
 
import MySQLdb
 
hostname = 'localhost'
user = 'root'
password = ''
db = ''
 
def get_all_tables(cursor):
    tables = []
    cursor.execute('show tables')
    rows = cursor.fetchall()
    for row in rows:
        tables.append(row[0])
    return tables
 
 
def find_double_fields(cursor, tables):
    problematic_fields = []
    for table in tables:
        cursor.execute(u'show fields from `%s`' % table)
        rows = cursor.fetchall()
        for row in rows:
            col_name = row[0]
            col_type = row[1]
            if col_type == u"double(7,2)":
                problematic_fields.append({"table": table, "col_name": col_name})
    return problematic_fields
 
def fix_fields(cursor, problems):
    for problem in problems:
        table = problem["table"]
        col_name = problem["col_name"]
        sql = u'ALTER TABLE `%s` CHANGE COLUMN `%s` `%s` DECIMAL(7,4)'
        cursor.execute(sql_template % (table, col_name, col_name))
 
conn = MySQLdb.connect(host=hostname, user=user,
                       passwd=password, db=db,
                       use_unicode=True, charset="cp1250")
conn.begin()
 
cursor = conn.cursor()
tables = get_all_tables(cursor)
problems = find_double_fields(cursor, tables)
fix_fields(cursor, problems)
 
conn.commit()
conn.close()