Felix Schwarz Diplom-Informatiker
Software-Entwicklung und Beratung

How to make input validation really complicated

Thanks to Vera Djuraskovic there is a Serbo-Croatian translation of this article available.

In every web application you need to validate your input data very carefully. Data validation is a very common task and so it's surprising that there are several validation libraries in Python (like formencode) which include validators for common tasks. However, Trac does not integrate any of these libraries so every plugin developer has to write their own validation code.

Now look how complicated you can check if a string does only contain the characters a-z, hyphens ('-'), underscore ('_'):

# Only alphanumeric characters (and [-_]) allowed for custom fieldname
# Note: This is not pretty, but it works... Anyone have an easier way of checking ???
f_name, f_type = customfield[Key.NAME], customfield[Key.TYPE]
match = re.search("[a-z0-9-_]+", f_name)
length_matched_name = 0
if match != None:
    match_span = match.span()
    length_matched_name = match_span[1] - match_span[0]
namelen = len(f_name)
if (length_matched_name != namelen):
    raise TracError("Only alphanumeric characters allowed for custom field name (a-z or 0-9 or -_).")

Please note how deep the author digged into Python's re library to find the span() method. So he first looks for an acceptable substring, computes the position of this substring, derives the length of the substring from that and checks if the length of the substring equals the length of the whole string.

At least the author had some doubts if his solution is the most elegant one (see the second line of the snippet above). So a simpler method of checking could be:

# Only alphanumeric characters (and [-_]) allowed for custom fieldname
f_name, f_type = customfield[Key.NAME], customfield[Key.TYPE]
match = re.search("([a-z0-9-_]+)", f_name)
if (match == None) or (match.group(1) != f_name):
    raise TracError("Only alphanumeric characters allowed for custom field name (a-z or 0-9 or -_).")

So now we got rid of all the index stuff. But still, you can do much easier than that by properly using regular expressions:

# Only alphanumeric characters (and [-_]) allowed for custom fieldname
f_name, f_type = customfield[Key.NAME], customfield[Key.TYPE]
if re.search('^[a-z0-9-_]+$', f_name) == None:
    raise TracError("Only alphanumeric characters allowed for custom field name (a-z or 0-9 or -_).")

Of course there is re.match but I try to avoid it due to personal issues with the method - I produced some bugs when using re.match previously.

You wonder which software does ship this mess? This code is part of a plugin which helps you to manage custom fields for your Trac tickets (CustomFieldAdminPlugin). Also the overall code quality of the module is quite poor so if you like to spend some time to learn refactoring, this is a good place to start.