But its not enough to start off writing good code. Good code can still go bad, slowly, by creeping bit by bit, till its evil.
No where is this more obvious, than in regular expressions.
My friend had a regex for parsing HTML tags for anchors to get the reference out. It started with something like:
<a href='menu.html'>
for which you might write:
/<a\s+href='(.*?)'>/
But there's always other junk in those tags, so you put some filler in there as well:
<a href='menu.html' id=e1ch title="main menu">
/<a.*?\bhref='(.*?)'.*>/
But then of course you run into cases where href uses double quotes, or might not even use quotes at all, so you try something like:
/<a.*?\bhref=([\'\"])?(\S*?)\1.*>/
And that gets you single quotes and double quotes, but some terrible things happen when there's no quotes. First of all, what is \1 if the first (['"])? conditional'ed out because ['"] didn't match anything. Well, it turns out to be in an invalid state or worse, its left over from the last regex that ran, so you mess with that:
/<a.*?\bhref=([\'\"]?)(\S*?)\1.*>/
haha, now $1 can be empty, so \1 works just fine. But wait, now the url is coming back empty. WTF? Lets run through it and see what happened:
<a href=menu.html id=e1ch title="main menu">
The href matched, then the first () matched nothing, and \1 is set to nothing. So then the non-greedy match said "how much do I need to match to move on?" "nothing" and so it matched zero characters, and then the greedy .* at the end ate everything.
So then you tell it to match something, by guarding the greedy match at the end with some whitespace:
/<a.*?\bhref=([\'\"]?)(\S*?)\1\s+.*>/
But what happens if there isn't anything after the href? the pattern won't match. So now you have to make the whole thing conditional:
/<a.*?\bhref=([\'\"]?)(\S*?)\1(?:\s+.*)?>/
And heaven help the next person that tries to work on that.
Labels: perl regex ballofmud
Feb '04
Oops I dropped by satellite.
New Jets create excitement in the air.
The audience is not listening.
Mar '04
Neat chemicals you don't want to mess with.
The Lack of Practise Effect
Apr '04
Scramjets take to the air
Doing dangerous things in the fire.
The Real Way to get a job
May '04
Checking out cool tools (with the kids)
A master geek (Ink Tank flashback)
How to play with your kids