One of the next versions of RapidMiner (5.0.011 or the upcoming version 5.1) will provide a nice extension of the expression parser which is for example used for the operator "Generate Attributes".  The operations are performed with the operator "Generate Attributes" and can be used directly within the expressions for the new attributes.

The supported functions include

  • Number to String [str(x)],
  • String to Number [parse(text)],
  • Substring [cut(text, start, length)],
  • Concatenation [concat(text1, text2, text3…)],
  • Replace [replace(text, what, by)],
  • Replace All [replaceAll(text, what, by)],
  • To lower case [lower(text)],
  • To upper case [upper(text)],
  • First position of string in text [index(text, string)],
  • Length [length(text)],
  • Character at position pos in text [char(text, pos)],
  • Compare [compare(text1, text2)],
  • Contains string in text [contains(text, string)],
  • Equals [equals(text1, text2)],
  • Starts with string [starts(text, string)],
  • Ends with string [ends(text, string)],
  • Matches with regular expression exp [matches(text, exp)],
  • Suffix of length [suffix(text, length)],
  • Prefix of length [prefix(text, length)],
  • Trim (remove leading and trailing whitespace) [trim(text)].

It is amazing how many new data transformations you can perform with this simple set of text operations. Actually, I often had to use the operator "Execute Script" for this type of operations which is now no longer necessary.

I have also just uploaded a process on myExperiment , which can be directly downloaded with our Community Extension (but of course you will need the RapidMiner update first 😉 ). The process is named "Extended Operations for Nominal Values" – just like this blog entry.