The birth of SimpleExpressions

March 3, 2014 · OpenSource Dev Regex

Searching for a good usage example to illustrate my Dynamics talk, I came to the idea of imitating the structure of the Simple.Data Framework to facilitate the creation of Regular Expressions... because between you and me, having to write Regular Expressions is what makes me avoid using them in the first place!

The idea is to use a functional programming style coupled with an English API to produce Expressions that can be translated into Regular Expressions while still remaining easy to read and easy to understand a few months down the line (when the cells of your brain that used to store that regular expression knowledge got overwritten by silly movie quotes and lullabies).

I chatted with some enthusiastic colleagues about it and after a few hours I had the basic semantic of such a domain specific language (DSL) clarified.

Verbal Expressions

On the road to writing this DSL, I stumbled upon a similar project called VerbalExpressions. The original project was written in Javascript but a bunch of forks were made for alternative languages (Ruby, C#, Python, Java, Groovy, PHP, Haskell, C++ and Objective-C).

The Github homepage of the project shows this Javascript example:

var tester = VerEx()
        .startOfLine()
        .then( "http" )
        .maybe( "s" )
        .then( "://" )
        .maybe( "www." )
        .anythingBut( " " )
        .endOfLine();

This is obviously the construction of an expression matching http:// and https:// with or without www. followed by some website domain. Pretty neat, isn't it?

I was saddened to find this project. It sounded so similar to the idea I had. (Un)fortunately there is not much beside this example in VerbalExpressions and the little more syntax there is I find to be too terse and hard to use.

For example, matching http://, https:// or ftp:// goes like this:

var expression = VerEx()
             .find( "http" )
             .maybe( "s" )
             .then( "://" )
             .or()
             .then( "ftp://" )

I find the then().or().then() construct utterly disturbing. As it is written here I would have expected the or to apply to the previous then only, creating two expression:

    .find( "http" )
    .maybe( "s" )
    .then( "://" )
or
    .find( "http" )
    .maybe( "s" )
    .then( "ftp://" ) //Matches a silly "httpsftp://" 

The problem is here that our languages are not precise enough for us to make the difference.

If you asked me "Could you please get me a burger and fries or a pizza?", you'd have to be really hungry. I could either go with the common sense: burger goes with fries, not with a pizza. I could follow the intonation and the tempo of your pronunciation which might hint that burger and fries are somewhat related to one another. Or I could follow the small devil perched on my shoulder and get you a burger and a pizza, just because it would be a lot more fun to watch!

Further VerbalExpression concepts, like the following, do not light up a bulb over my head to say the least.

VerEx().then( "." ).replace( my_paragraph, ". Stop." );

What is really intended here? I think it wants to replace all dots . with . Stop. but I couldn't tell for sure without running it... which is - at least for me - a big design smell.

The VerbalExpression syntax is thus a neat idea, but not matured enough. There is room for something else. That's how the SimpleExpressions saw the light.

Head up to the second part for more details about the semantics of the DSL.

Comments powered by Disqus