Beware of writing regex and string functions

Recently i was involved in an issue took a week to come to know the root cause. In the end its an eye opener to many who does not give importance to string functions and regex. “Regular expressions and String functions are quite powerful in any language; however utmost importance should be given to such code.”

The issue is very simple. Set of Java Files need to processed to get some annotations and other proprietary stuff and also separate the main class names and inner class names. The customer created Business Entities which may contain inner classes and are passed through a pre-processor. Problem occurs in a particular case when the File name is “BlaSomeClassName_Bla.java” and it contains a inner class “SomeClass”.

–>Inner class name is SIMILAR to main class name.

Lets look at the following code and especially the line 6. This line tries to match the class names given by qdox (java source parser) with the java source file that is currently being processed.

1     if (classes.length == 1) {
2         _javaClass = classes[0];
3     } else {
4         for (int i = 0; i < classes.length; i++) {
5             JavaClass aClass = classes[i];
6             if (aSourceFile.getName().matches(".*" + aClass.getName() + ".*")) {
7                 _javaClass = classes[i];
8                 break;
9             }
10        }
11    }

This is the regular expression that took up my days and nights which rarely has any sort of consistency in execution. In the above example the source that is being processed is the “BlaSomeClassName_Bla.java” and the class names that you get from qdox will be “BlaSomeClassName_Bla” and “SomeClass”. And now probably you would have guessed. In the array “classes”, if the “SomeClass” comes as the first element you are screwed. The regular expression matches the “BlaSomeClassName_Bla” and the processing class is taken as “SomeClass”. Where as the right processing class is “BlaSomeClassName_Bla”.

This issue took quite a few days to really understand and get to the bottom of the code. Many many thanks to eclipse which enables a cool debugging. Conditional debugging is very useful in such scenarios where you would not want to wait for a long time to see the special case. Instead, introduce the right condition and rest is taken care by eclipse. This is what makes the eclipse my favorite IDE.

Do you have any such experiences with strings and regex ?

7 thoughts on “Beware of writing regex and string functions

  1. cranley

    I’ll pre-empt this comment by stating I haven’t programmed in Java since the previous Millenium, so please pardon my ignorance.

    In java, does the file name have to be the same name of the outter-most class contained within? In other words, if I had the following class in a .java file:


    public class OutterClass{

    // bunch of stuff

    public class InnerClass{
    // more stuff
    }
    }

    Would the file have to be called OutterClass.java? If so, would using the actual file extension as part of your expression help? Something like the following:

    ...
    JavaClass aClass = classes[i];
    String extension = aSourceFile.Extension; // just pretends this actually exists
    if(aSourceFile.getName().matches(".*" + aClass.getName() + "." + extension)){
    ....

    At the very least you’d be able to finalise where the file name (pre-extension) ends.

    But like I said, I haven’t touched java in 10 years, and I’ve no idea if I’m anywhere close to speaking coherently.

    Reply
  2. Shams Mahmood

    @Cranley

    Only public classes need to be named in a file with the same name.
    e.g. Outer.java can contain

    public class OuterClass {
    class InnerClass {
    }
    }

    class AnotherClass {
    class InnerClass {
    }
    }

    Reply
  3. Shams Mahmood

    @sureshkrishna

    Could you explain more what the code fragment wants to achieve
    Why is the code using matches() rather than an equals() comparison by forming the filename? I would think equals has a better performance than matches.

    Reply
  4. sureshkrishna

    @Shams

    Its a legacy code which we have from a long time. The only thing i know is that “somehow” this code exists.

    A business entity class might have some business rules as Inner Classes. qdox is used as the java source parser and it would give you all the class names from this source file.
    At this time you need to match the source file name and the class names that qdox gives, so that the “matching class” can be used for further processing.

    As you said i would also imagine that the EXACT match is the right solution with equals(). But the code exists somehow and the successors need to battle these issues out.

    Reply
  5. Shams Mahmood

    @sureshkrishna

    I hope since you have found the bug, you have changed the code to use equals() 🙂
    Btw, thanks for introducing me to qdox, seems a handy tool. I have used the eclipse jdt to parse java files previously but this seems like a handy tools for simple parsing 🙂

    Reply
  6. Fred

    if (aSourceFile.getName().matches(“.*” + aClass.getName() + “.*”))

    should simply have been written as:

    if (aSourceFile.getName().indexOf( aClass.getName() ) > -1)

    Reply
  7. ks

    How about just creating an interface that you can use to tag all the BlaSomeClassName_Bla classes. That way, there is no need to parse filenames – just check if the class implements this BusinessEntitiy interface and job done. Or have I missed the point here? (I haven’t used qdox so I am not aware of how this preprocessing works)

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *