Tuesday, February 14, 2017

Regular expressions in Java

Ugh! I will probably be working in Java for a while. Or I will kill myself, one of the two. So far I hate Eclipse, I can't even write code and I have to blog simple stuff like how to do regular expressions. Well, take it as a learning experience! Exiting my comfort zone! 2017! Ha, ha haaaaa [sob! sob!]

In Java you use Pattern to do regex:
Pattern p = Pattern.compile("a*b");
Matcher m = p.matcher("aaaaab");
boolean b = m.matches();

So let's do some equivalent code:

var regDate=new Regex(@"^(?<year>(?:19|20)?\d{2})-(?<month>\d{1,2})-(?<day>\d{1,2})$",RegexOptions.IgnoreCase);
var match=regDate.Match("2017-02-14");
if (match.Success) {
  var year=match.Groups["year"];
  var month=match.Groups["month"];
  var day=match.Groups["day"];
  // do something with year, month, day

Pattern regDate=Pattern.compile("^(?<year>(?:19|20)?\\d{2})-(?<month>\\d{1,2})-(?<day>\\d{1,2})$", Pattern.CASE_INSENSITIVE);
Matcher matcher=regDate.matcher("2017-02-14");
if (matcher.find()) {
  String year=matcher.group("year");
  String month=matcher.group("month");
  String day=matcher.group("day");
  // do something with year, month, day


The first thing to note is that there is no verbatim literal support in Java (the @"string" format from .NET) and there is no "var" (one always has to specify the type of the variable, even if it's fucking obvious). Second, the regular expression object Pattern doesn't do things directly, instead it creates a Matcher object that then does operations. The two bits of code above are not completely equivalent, as the Success property in a .NET Match object holds the success of the already performed operation, while .find() in the Java Matcher object actually performs the match.

Interestingly, it seems that Pattern is automatically compiling the regular expression, something that .NET must be directed to do. I don't know if the same term means the same thing for the two frameworks, though.

Another important thing is that it is more efficient to reuse matchers rather than recreate them. So when you want to use the matcher on another string, use matcher.reset("newstring").

And lastly, the string class itself has quick and dirty regular expression methods like .matches, replaceFirst and .replaceAll. The matches method only returns a bool if the string is a perfect match (equivalent to a Pattern match with ^ at the beginning and $ at the end).