Tag Archives: java

Beware of writing regex and string functions

Recently i was involved in an issue took a week to come to know the root cause. In the end its an eye opener to many who does not give importance to string functions and regex. “Regular expressions and String functions are quite powerful in any language; however utmost importance should be given to such code.”

The issue is very simple. Set of Java Files need to processed to get some annotations and other proprietary stuff and also separate the main class names and inner class names. The customer created Business Entities which may contain inner classes and are passed through a pre-processor. Problem occurs in a particular case when the File name is “BlaSomeClassName_Bla.java” and it contains a inner class “SomeClass”.

–>Inner class name is SIMILAR to main class name.

Lets look at the following code and especially the line 6. This line tries to match the class names given by qdox (java source parser) with the java source file that is currently being processed.

1     if (classes.length == 1) {
2         _javaClass = classes[0];
3     } else {
4         for (int i = 0; i < classes.length; i++) {
5             JavaClass aClass = classes[i];
6             if (aSourceFile.getName().matches(".*" + aClass.getName() + ".*")) {
7                 _javaClass = classes[i];
8                 break;
9             }
10        }
11    }

This is the regular expression that took up my days and nights which rarely has any sort of consistency in execution. In the above example the source that is being processed is the “BlaSomeClassName_Bla.java” and the class names that you get from qdox will be “BlaSomeClassName_Bla” and “SomeClass”. And now probably you would have guessed. In the array “classes”, if the “SomeClass” comes as the first element you are screwed. The regular expression matches the “BlaSomeClassName_Bla” and the processing class is taken as “SomeClass”. Where as the right processing class is “BlaSomeClassName_Bla”.

This issue took quite a few days to really understand and get to the bottom of the code. Many many thanks to eclipse which enables a cool debugging. Conditional debugging is very useful in such scenarios where you would not want to wait for a long time to see the special case. Instead, introduce the right condition and rest is taken care by eclipse. This is what makes the eclipse my favorite IDE.

Do you have any such experiences with strings and regex ?