Posts tagged String Manipulation
A Simple Regex Matcher
0This is a classic technical interview question which tests knowledge of string manipulation and recursion.
Question:
Given a regex expression and a string, return true if the regex matches the string. The rules:
- Strings are composed of alphanumeric characters.
- Regexes are composed of alphanumeric characters and 2 special characters ‘.’ and ‘*’.
- A regex matches a string if every character in the regex matches every character in the string.
- Two single characters match if they are equal.
- The special character ‘.’ matches with any single character.
- If the special character ‘*’ appears in the regex, then for our string we can match 0 or more contiguous occurrences of the character preceding the ‘*’. For example, “a*bc” matches “aaabc”.
- For “.*”, we can match 0 or more contiguous occurrences of any character. For example, “.*bc” matches “xyzbc”.
- A pair of quotes (“”) denotes an empty string.
Input (input.txt):
a a
abc abc
a.c axc
a*bc aaabc
a*bc bc
.*bc xyzbc
a*aa aaaa
aa aaa
aaa aa
a* “”
Output (stdout):
true
true
true
true
true
true
true
false
false
true
Solution:
The basic idea is to attempt to match the start of the regex with the start of the string and recursively match the remaining regex against the remaining string.
public static void main(String[] args) throws IOException { Scanner in = new Scanner(new FileReader("src/input.txt")); while (in.hasNextLine()) { String regex = in.next(); String string = in.next(); //Special handling of empty strings if(regex.equals("\"\"")){ regex=""; } if(string.equals("\"\"")){ string=""; } System.out.println(regexMatchesString(regex, string)); } } static boolean regexMatchesString (String regex, String string) { if(regex.equals("") && string.equals("")) { //Base case when regex matches string. return true; } else if (regex.equals("")) { //Base case when string exceeds regex. return false; } else { //Attempt to match start of regex with start of string. if(regex.length()>1 && regex.charAt(1)=='*') { //Handle wild card: //a)Try ignoring wild card. boolean matched = regexMatchesString (regex.substring(2), string); if (matched) { return true; } //b)Try consuming one or more matching characters. for (int i=0;i<string.length();i++) { if (matches(regex.charAt(0), string.charAt(i))) { matched = regexMatchesString (regex.substring(2), string.substring(i+1)); if (matched) { return true; } } else { return false; } } //Remaining regex does not match remaining string. return false; } else if (string.equals("")) { //Base case when regex exceeds string. return false; } else if (matches(regex.charAt(0), string.charAt(0))) { //Try consuming a single character. return regexMatchesString (regex.substring(1), string.substring(1)); } else { //Start of regex does not match start of string. return false; } } } //Returns whether a single regex character matches a single string character. static boolean matches (char regexChar, char stringChar) { return regexChar == '.' || regexChar == stringChar; }
The trickiest part of this solution is how the wildcard is handled – we must test all n+1 cases of consuming 0 to n matching characters with our wild card since it not necessarily the case that we want to consume all matching characters. (Ex. “a*ab” should match “aab” – the “a*” only consumes a single “a” in this case.)