Backtracking is the powerful and popular feature provided by Java regex engine when the regex pattern contains optional quantifiers or alternations, in which the regex engine can return to the previous saved state (backtracking) to continue matching/finding other alternative when the current try fails. However it is also a "well known" issue that some not-well-written regex might trigger "exponential backtraking" in situation that there is no possible match can be found after exhaustive search, in the worst case the regex engine will just keep going forever, hangup the vm. We have various reports in the past as showed in RegexFun.java [1] for this issue. For example following is the one reported in JDK-6693451. regex: "^(\\s*foo\\s*)*$" input: "foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo fo", In which the "group" (\\s*foo\\s*)*$" can alternatively match any one of the "foo", "foo ", " foo" and " foo ". The regex works fine and runs normally if it can find a match (replace the last "fo" in the input to "foo" for example). But in this case because the input text ends with a "fo", the regex actually can never match the input, it always fails at last one, nothing in the regex can match the "fo". Given the nature of the "greedy quantifier", the engine will be backtracking/ bouncing among these 4 choices at each/every iteration/level and keep searching to the end of the input exhaustively, fail, backtrack, and try again recursively. Based on the number of "foo" in the input, it can run forever until the end of the world. There are lots of suggestions on how to avoid this kind of runaway regex. One is to use the possessive quantifier, if possible, to replace the greedy quantifier ( * -> *+ for example ) to workaround this kind of catastrophic backtracking, and it normally meets the real need and makes the exponential backtracking go away. The observation of these exponential backtracking cases indicates that most of these "exponential backtracking" however are actually not necessary in most cases. Even we can't solve all of these catastrophic backtracking, it might be possible, with a small cost, we can actually workaround most of the "cases", in particular (1) when the "exponential backtracking" happens at the top level group + greedy quantifier, and (2) there is no group ref in the pattern. The "group + greedy repetition" in j.u.regex is implemented via three nodes, Prolog -> Loop -> body node" in wihch the body takes care of (\\s*foo\\s*), the Loop takes care of the looping for "*", and the Prolog, as its name, just a starter to kick of the looping (read source for more details). The normal matching sequence works as Prolog.match() -> Prolog.loop.matchInit() -> body.match() -> loop.match() -> body.match() -> loop.match() -> body.match() -> loop.match() ... -> body.match() -> loop.match() -> loop.next.match() The "body.match() -> loop.match -> body.match() -> loop.match() ..." is looping on the runtime call stack "recursively" (the body.next is pointing to loop). If we take the "recursive looping" part that we are interested (we only care about the "count >= cmin" situation, as it is where the exponential backtracking happens) out of Loop.match(), the loop.match is as simple as boolean match(Matcher matcher, int i, CharSequence seq) { ---> Before boolean b = body.match(matcher, i, seq); // If match failed we must backtrack, so // the loop count should NOT be incremented if (b) return true; matcher.locals[countIndex] = count; ---> After return next.match(matcher, i, seq); } It appears that each of the repetitive "body.match() + loop.match()" looping (though actually recursive on the runtime stack) actually is matching the input text kind of "linearly" in the order of loop counter, if this is a "top level" group repetition (not an inner group inside another repetitive group) ... foo foo foo foo foo foo foo... |_ i = n, (body-loop)(body-loop)(body-loop).... ->next() given the nature of its "greediness", the engine will exhausts all the possible match of "body+loop+ loop.next" from any given starting position "i". We can actually assume that if we have failed to match A "body + loop + its down-stream matching" last time at position "i", next time if we come back here again (after backtracking the "up stream" and come down here again), and if we are at exactly the same position "i", we no longer need to try it again, the result will not change (failed). With the assumption that we DO NOT have a group ref in the pattern (the result/content of the group ref can change, if the down-stream contains the group ref, we can't assume the result going forward will be the same, even start at the same position "i"). for example for the above sample, when we backtrack to loop counter "n", we can dance among 4 choices "foo", " foo", "foo " and " foo ", but when we pick one and more on to the next iteration/loop at n + 1, the only possible choice is either "foo..." or " foo" (with a leading space character), if we have tried either of them (or both) last time and it failed, we no longer need to try the same "down stream" again . And the good thing is that this "position-lookup-table" can be shared by all cmin <= n <= cmax iterations, again, because the nature of greediness. In last failed try, the engine has tried/exhausted all possible combinations of body->loop->body->loop...body->loop->next at "i", including the possilble exhaustive backtracking done by the embedded inner nodes of this group. So we are here at the same position 'i" again, the result will be the same. So here is my propoased workaround solution. We introduce in a "hashmap" (home-made hashmap for "int", supposed to be cheap and faster) to memorize all the tried-and-failed position "i", and check this hashmap before starting a new round of loop. boolean match(Matcher matcher, int i, CharSequence seq) { ---------------- if (posIndex != -1 && matcher.localsPos[posIndex].contains(i)) { return next.match(matcher, i, seq); } ---------------- boolean b = body.match(matcher, i, seq); // If match failed we must backtrack, so // the loop count should NOT be incremented if (b) return true; matcher.locals[countIndex] = count; ---------------- if (posIndex != -1) { matcher.localsPos[posIndex].add(i); } ---------------- return next.match(matcher, i, seq); } This makes most of the exponential backtracking failures in RegexFun.java [1] go away, except the two issues. (1) the exponential backtracking starts/happens at the second inner group level, in which the proposed change might help, but does not make it go away regex: "(h|h|ih(((i|a|c|c|a|i|i|j|b|a|i|b|a|a|j))+h)ahbfhba|c|i)*" input: "hchcchicihcchciiicichhcichcihcchiihichiciiiihhcchicchhcihchcihiihciichhccciccichcichiihcchcihhicchcciicchcccihiiihhihihihichicihhcciccchihhhcchichchciihiicihciihcccciciccicciiiiiiiiicihhhiiiihchccchchhhhiiihchihcccchhhiiiiiiiicicichicihcciciihichhhhchihciiihhiccccccciciihhichiccchhicchicihihccichicciihcichccihhiciccccccccichhhhihihhcchchihihiihhihihihicichihiiiihhhhihhhchhichiicihhiiiiihchccccchichci" (2) it's a {n, m} greedy case. This one probably can be handled the same way as the */+ but need a little more testing. regex: "(\\w|_){1,64}@" input: "______________________________" There are another kind of "abusing" case that overly and repeatitively uses ".*" or ".+", such as reported in jdk-5014450 http://cr.openjdk.java.net/~sherman/regexBacktracking/RegexFun3.java regex: "^\\s*sites\\[sites\\.length\\+\\+\\]\\s*=\\s*new\\s+Array\\(.*" + "\\s*//\\s*(\\d+).*" + "\\s*//\\s*([^-]+).*" + "\\s*//\\s*([^-]+).*" + "\\s*//\\s*([^-]+).*" + "/(?sui).*$" text "\tsites[sites.length++] = new Array(\n" + "// 1079193366647 - creation time\n" + "// 1078902678663 1078852539723 1078753482632 0 0 0 0 0 0 0 0 0 0 0 - cre [1] http://cr.openjdk.java.net/~sherman/regexBacktracking/RegexFun.java