代码之家 › 专栏 › 技术社区 › TCCV

Regex替换嵌套令牌

regex java

TCCV · 技术社区 · 9 年前

我需要采用正则表达式模式并用程序转义大括号。输入正则表达式将匹配以下模式(标记前、标记后和标记之间的文本):

&{token1}
&{token1}&{token2}&{tokenN...}
&{token1&{token2&{tokenN...}}}

到目前为止,除了嵌套标记之外,我对所有内容都很满意。这就是我所拥有的。

regex = regex.replaceAll("(&)(\\{)([^{}]+)(\\})", "$1\\\\$2$3\\\\$4");

我也尝试过使用迭代和递归,但我遇到的问题是,一旦最里面的标记被转义,它就会打乱匹配。

我尝试过负面的外观,但这并没有达到我的预期。它将只匹配/替换最里面的令牌。

regex = regex.replaceAll("(&)(\\{)([^(?<!\\\\{)|(?<!\\\\})]+)(\\})", "$1\\\\$2$3\\\\$4");

有什么建议吗?提前感谢。

编辑:示例输入/输出

&{token1}   //input
&\{token1\} //output

&{token1}&{token2}&{tokenN...}        //input
&\{token1\}&\{token2\}&\{tokenN...\}  //output

&{token1&{token2&{tokenN...}}}        //input
&{token1&{token2&\{tokenN...\}}}      //output
&\{token1&\{token2&\{tokenN...\}\}\}  //expected output

//To throw a wrench into it, normal quantifiers should not be escaped
text{1,2}&{token1&{token2&{tokenN...}}}        //input
text{1,2}&{token1&{token2&\{tokenN...\}}}      //output
text{1,2}&\{token1&\{token2&\{tokenN...\}\}\}  //expected output

编辑2:此过程之外发生的情况示例:标记将被解析为文本,最后,它应该是一个有效的正则表达式。

a{2}&{token1&{token2&{tokenN...}}}        //input
a{2}&\{token1&\{token2&\{tokenN...\}\}\}  //expected output of this regex
a{2}foobarbaz                             //expected output after tokens are resolved (&{token1} = foo, &{token2} = bar, &{tokenN...} = baz)

2 回复 | 直到 9 年前

m.cekiera 9 年前

尝试使用:

regex = regex.replaceAll("(?<=&)(?=\\{)|(?<!\\{\\d{0,6},?(\\d{0,6})?)(?=\\})","\\\\");

哪里 (0,6) 确定可能有多少个数字,我认为6就足够了 Java示例:

public class Main {
    public static void main(String[] args){
        int i = 0;
        String regex = "&{token1}&{token2}&{tokenN}\n" +
                "&{token1&{token2&{tokenN}}}\n" +
                "text{1,2}&{token1{1}&{token2{1,}&{tokenN{0,2}}}}\n";
        regex = regex.replaceAll("(?<=&)(?=\\{)|(?<!\\{\\d{0,6},?(\\d{0,6})?)(?=\\})","\\\\");
        System.out.println(regex);
    }
}

输出:

&\{token1\}&\{token2\}&\{tokenN\}
&\{token1&\{token2&\{tokenN\}\}\}
text{1,2}&\{token1{1}&\{token2{1,}&\{tokenN{0,2}\}\}\}

Pshemo 9 年前

我将避免正则表达式,并创建一个简单的状态机,它将存储关于 { 根据这些信息,我们每次发现 } 我们可以做出适当的决定,逃避或取消隐藏,并删除最后的信息,因为我们不再需要它了。

所以你的代码看起来像

public static String myEscape(String text){
    StringBuilder sb = new StringBuilder();

    char prev = '\0';
    Stack<Boolean> stack = new Stack<>();

    for (char ch : text.toCharArray()){
        if (ch == '{'){
            if (prev == '&'){
                sb.append('\\');
            }
            stack.push(prev == '&');
        }else if (ch == '}'){
            if (stack.pop()){
                sb.append('\\');
            }
        }
        sb.append(ch);
        prev = ch;
    }
    return sb.toString();
}

例子:

text{1,2}&{token1&{token2{foo}...}}

我们先找到 { 并确保它之前没有 & 我们在堆栈中放置 false
当我们发现 } 并根据堆栈的顶值(false)决定不应转义
我们看到另一个 { 因为它前面是 & 我们把它放在堆栈的顶部 true
我们找到另一个 { 因为它前面还加了 & 我们在堆栈顶部放置另一个 真的
我们找到另一个 { 这一次之前没有 & 所以我们放在堆栈的顶部 假的

因此,正如我们所看到的,堆栈存储了关于我们下一步是否应该逃跑的信息 } 是否,当前 false -> true -> true 我们接下来可以看到 } 意味着我们应该期待 } \} \} .