But its not enough to start off writing good code. Good code can still go bad, slowly, by creeping bit by bit, till its evil.
No where is this more obvious, than in regular expressions.
My friend had a regex for parsing HTML tags for anchors to get the reference out. It started with something like:
for which you might write:
But there's always other junk in those tags, so you put some filler in there as well:
<a href='menu.html' id=e1ch title="main menu">
But then of course you run into cases where href uses double quotes, or might not even use quotes at all, so you try something like:
And that gets you single quotes and double quotes, but some terrible things happen when there's no quotes. First of all, what is \1 if the first (['"])? conditional'ed out because ['"] didn't match anything. Well, it turns out to be in an invalid state or worse, its left over from the last regex that ran, so you mess with that:
haha, now $1 can be empty, so \1 works just fine. But wait, now the url is coming back empty. WTF? Lets run through it and see what happened:
<a href=menu.html id=e1ch title="main menu">
The href matched, then the first () matched nothing, and \1 is set to nothing. So then the non-greedy match said "how much do I need to match to move on?" "nothing" and so it matched zero characters, and then the greedy .* at the end ate everything.
So then you tell it to match something, by guarding the greedy match at the end with some whitespace:
But what happens if there isn't anything after the href? the pattern won't match. So now you have to make the whole thing conditional:
And heaven help the next person that tries to work on that.
Labels: perl regex ballofmud