Incorporating Regex into ORM's and Database Abstractions

Almost on a daily basis I come across using regular expressions rather convenient. I first toyed with regular expressions in my Compiler Design classes during my undergrad studies, and little did I know they can be used for everyday things like submitting forms. I've even toyed with them to create my own json parser using an SQL like syntax. I think they're a great way to perform strict validation and searching among other things. Searching. There is something I need to work on. Not just strings, but database searches for account numbers, phone numbers, etc. Not just '%some string%'. Regular Expressions within SQL is something I have blogged about before. I've come to find that Regular Expression support is far from constant and in some cases not even supported(SQLite).

Incase you're curious what I'm talking about, some database engines allow you to write something like the following:

SELECT zip FROM zipcode WHERE REGEXP_LIKE(zip, '[^[:digit:]]’)

Sure, this process probably isn't very efficient, but it can be very powerful.

More and more frameworks are moving towards ORM's. You no longer have to write database specific code as the database layer will handle the specifics for you, but none that I am aware of will handle anything of this complexity and for obvious reason. Lets step back to a simpler solution that many of us deal with on a daily basis:


jQuery somewhat creates it's own language. It's not xpath, its somewhat CSS selectors(ok, mostly css selectors), but there's much more added to it's own engine. Perhaps this approach could be a simple solution to the problem. Support a simplified subset or maybe my own mini language, and use the driver specific code to handle generating the database specific Regular Expressions. Here's what such example may resemble in pseudo code:

$db->select('mytable')->where_regex("acct", "^\d{4}\-\d{8}$");

This code might search mytable for the acct that matches a pattern like "1234-12345678". This isn't terribly complex, however will suffice for the majority of everyday needs. Only supporting this level of complexity is more within the reach creating such if possible. This is my first post among others to come as this is mostly a brainstorm at this point. I openly welcome any pointers, advice, resources, or any other comments!