Regular Expression Madness

Share on twitter
Share on facebook
Share on linkedin

I just stumbled upon this great blog post about some uncommon uses of regular expressions. RapidMiner also makes a lot use of those beasts, especially for the definition of filters so I thought this post might be interesting to you.

Both examples are taken from the book The Unix Programming Environment by Kernighan and Pike (1984).

The first problem is to produce a list of all English words that contain all five vowels exactly once and in alphabetical order.

The book creates a regular expression

^[^aeiou]*a[^aeiou]*e[^aeiou]*i[^aeiou]*o[^aeiou]*u[^aeiou]*$

then uses it to filter a dictionary file. This produced 16 words ranging from abstemious to majestious.

The second problem is to produce a list of all English words of at least six letters with letters appearing in increasing alphabetical order.

The book creates a regular expression

^a?b?c?d?e?f?g?h?i?j?k?l?m?n?o?p?q?r?s?t?u?v?w?x?y?z?$

then uses it to filter a dictionary file as before, except there is an additional filter stage.

This produced 17 words including common words such as almost and ghosty. Some of the more interesting results were bijoux, chintz, and egilops. Kernighan and Pike explain that egilops is a disease that attacks wheat.

For an explanation of those expressions please refer to the original blog post . And have fun while you are creating similar expressions for your next example filter 😉

 

 

Ingo Mierswa

Ingo Mierswa

Ingo Mierswa is the founder and president of RapidMiner and an industry-veteran data scientist since starting to develop RapidMiner at the Artificial Intelligence Division of the TU Dortmund University in Germany. Mierswa, the scientist, has authored numerous award-winning publications about predictive analytics and big data. Mierswa, the entrepreneur, is the founder of RapidMiner. Under his leadership RapidMiner has grown up to 300% per year over the first seven years. In 2012, he spearheaded the go-international strategy with the opening of offices in the US as well as the UK and Hungary. After two rounds of fundraising, the acquisition of Radoop, and supporting the positioning of RapidMiner with leading analyst firms like Gartner and Forrester, Ingo takes a lot of pride in bringing the world’s best team to RapidMiner.