代码之家  ›  专栏  ›  技术社区  ›  Yossale

Java一次替换字符串中的多个不同子串(或者以最有效的方式)

  •  80
  • Yossale  · 技术社区  · 15 年前

    我需要以最有效的方式替换字符串中的许多不同子字符串。 是否还有其他方法可以用string.replace来代替每个字段?

    10 回复  |  直到 6 年前
        1
  •  88
  •   Todd Owen    15 年前

    如果您正在操作的字符串很长,或者您在许多字符串上操作,那么使用java.util.regex.matcher可能是值得的(这需要提前编译时间,因此,如果您的输入非常小或搜索模式经常更改,则效率不高)。

    下面是一个完整的示例,基于从映射中获取的令牌列表。(使用来自apache commons lang的stringutils)。

    Map<String,String> tokens = new HashMap<String,String>();
    tokens.put("cat", "Garfield");
    tokens.put("beverage", "coffee");
    
    String template = "%cat% really needs some %beverage%.";
    
    // Create pattern of the format "%(cat|beverage)%"
    String patternString = "%(" + StringUtils.join(tokens.keySet(), "|") + ")%";
    Pattern pattern = Pattern.compile(patternString);
    Matcher matcher = pattern.matcher(template);
    
    StringBuffer sb = new StringBuffer();
    while(matcher.find()) {
        matcher.appendReplacement(sb, tokens.get(matcher.group(1)));
    }
    matcher.appendTail(sb);
    
    System.out.println(sb.toString());
    

    一旦正则表达式被编译,扫描输入字符串通常非常快(尽管如果正则表达式很复杂或涉及回溯,那么您仍然需要进行基准测试以确认这一点!)

        2
  •  40
  •   Community Egal    7 年前

    算法

    替换匹配字符串(不使用正则表达式)的最有效方法之一是使用 Aho-Corasick algorithm 带着表演 Trie (发音为“try”),快 hashing 算法,高效 collections 实施。

    简单代码

    也许最简单的代码可以利用Apache的 StringUtils.replaceEach 如下:

      private String testStringUtils(
        final String text, final Map<String, String> definitions ) {
        final String[] keys = keys( definitions );
        final String[] values = values( definitions );
    
        return StringUtils.replaceEach( text, keys, values );
      }
    

    这在大文本上会变慢。

    快码

    Bor's implementation 在aho corasick算法中,通过使用具有相同方法签名的fa§ade,引入了一点更复杂的实现细节:

      private String testBorAhoCorasick(
        final String text, final Map<String, String> definitions ) {
        // Create a buffer sufficiently large that re-allocations are minimized.
        final StringBuilder sb = new StringBuilder( text.length() << 1 );
    
        final TrieBuilder builder = Trie.builder();
        builder.onlyWholeWords();
        builder.removeOverlaps();
    
        final String[] keys = keys( definitions );
    
        for( final String key : keys ) {
          builder.addKeyword( key );
        }
    
        final Trie trie = builder.build();
        final Collection<Emit> emits = trie.parseText( text );
    
        int prevIndex = 0;
    
        for( final Emit emit : emits ) {
          final int matchIndex = emit.getStart();
    
          sb.append( text.substring( prevIndex, matchIndex ) );
          sb.append( definitions.get( emit.getKeyword() ) );
          prevIndex = emit.getEnd() + 1;
        }
    
        // Add the remainder of the string (contains no more matches).
        sb.append( text.substring( prevIndex ) );
    
        return sb.toString();
      }
    

    基准

    对于基准,缓冲区是使用 randomNumeric 如下:

      private final static int TEXT_SIZE = 1000;
      private final static int MATCHES_DIVISOR = 10;
    
      private final static StringBuilder SOURCE
        = new StringBuilder( randomNumeric( TEXT_SIZE ) );
    

    在哪里? MATCHES_DIVISOR 指示要注入的变量数:

      private void injectVariables( final Map<String, String> definitions ) {
        for( int i = (SOURCE.length() / MATCHES_DIVISOR) + 1; i > 0; i-- ) {
          final int r = current().nextInt( 1, SOURCE.length() );
          SOURCE.insert( r, randomKey( definitions ) );
        }
      }
    

    基准代码本身( JMH 似乎杀伤力过大):

    long duration = System.nanoTime();
    final String result = testBorAhoCorasick( text, definitions );
    duration = System.nanoTime() - duration;
    System.out.println( elapsed( duration ) );
    

    100万:1000

    一个简单的微型基准测试,用1000000个字符和1000个随机放置的字符串替换。

    • 测试字符串实用程序: 25秒,25533毫秒
    • 试验室珊瑚病: 0秒,68毫秒

    没有竞争。

    10000:1000

    使用10000个字符和1000个匹配字符串替换:

    • 测试字符串实用程序: 1秒,1402毫秒
    • 试验室珊瑚病: 0秒,37毫秒

    分水岭结束了。

    1000:10

    使用1000个字符和10个匹配字符串替换:

    • 测试字符串实用程序: 0秒,7毫秒
    • 试验室珊瑚病: 0秒,19毫秒

    对于短字符串,设置aho corasick的开销使野蛮的武力方法相形见绌。 stringutils.replaceach每个 .

    基于文本长度的混合方法是可能的,以获得两种实现的最佳效果。

    启动位置

    考虑比较超过1 MB的文本的其他实现,包括:

    论文

    与算法有关的论文和信息:

        3
  •  7
  •   Steve McLeod    15 年前

    如果要多次更改字符串,则通常使用StringBuilder更有效。 (但要衡量你的表现以找出答案) :

    String str = "The rain in Spain falls mainly on the plain";
    StringBuilder sb = new StringBuilder(str);
    // do your replacing in sb - although you'll find this trickier than simply using String
    String newStr = sb.toString();
    

    每次对字符串执行替换时,都会创建一个新的字符串对象,因为字符串是不可变的。StringBuilder是可变的,也就是说,它可以随意更改。

        4
  •  4
  •   Brian Agnew    12 年前

    StringBuilder 将更有效地执行替换,因为可以将其字符数组缓冲区指定为所需的长度。 字符串拼接 不仅仅是为了附加!

    当然,真正的问题是,这是否是一个过度优化?JVM非常擅长处理多个对象的创建和随后的垃圾收集,和所有优化问题一样,我的第一个问题是您是否测量过这个问题并确定它是一个问题。

        5
  •  2
  •   Avi    15 年前

    用这个怎么样 replaceAll() 方法?

        6
  •  2
  •   Gelin Luo    12 年前

    Rythm一个Java模板引擎现在发布了一个新的特性 String interpolation mode 它允许您执行以下操作:

    String result = Rythm.render("@name is inviting you", "Diana");
    

    上面的案例显示您可以按位置将参数传递给模板。Rythm还允许您按名称传递参数:

    Map<String, Object> args = new HashMap<String, Object>();
    args.put("title", "Mr.");
    args.put("name", "John");
    String result = Rythm.render("Hello @title @name", args);
    

    注释Rythm非常快,比Strug.Frand和Sturn的速度快大约2到3倍,因为它将模板编译成Java字节码,运行时性能与StringBuilder的连接非常接近。

    链接:

        7
  •  2
  •   bikram    6 年前

    这对我很有用:

    String result = input.replaceAll("string1|string2|string3","replacementString");
    

    例子:

    String input = "applemangobananaarefriuits";
    String result = input.replaceAll("mango|are|ts","-");
    System.out.println(result);
    

    输出: 苹果香蕉酥-

        8
  •  1
  •   Community Egal    13 年前

    检查一下:

    string.format(str,str[])

    例如:

    string.format(“将您的%s放在您的%s所在的位置”、“money”、“mouth”);

        9
  •  0
  •   Robin479    8 年前
    public String replace(String input, Map<String, String> pairs) {
      // Reverse lexic-order of keys is good enough for most cases,
      // as it puts longer words before their prefixes ("tool" before "too").
      // However, there are corner cases, which this algorithm doesn't handle
      // no matter what order of keys you choose, eg. it fails to match "edit"
      // before "bed" in "..bedit.." because "bed" appears first in the input,
      // but "edit" may be the desired longer match. Depends which you prefer.
      final Map<String, String> sorted = 
          new TreeMap<String, String>(Collections.reverseOrder());
      sorted.putAll(pairs);
      final String[] keys = sorted.keySet().toArray(new String[sorted.size()]);
      final String[] vals = sorted.values().toArray(new String[sorted.size()]);
      final int lo = 0, hi = input.length();
      final StringBuilder result = new StringBuilder();
      int s = lo;
      for (int i = s; i < hi; i++) {
        for (int p = 0; p < keys.length; p++) {
          if (input.regionMatches(i, keys[p], 0, keys[p].length())) {
            /* TODO: check for "edit", if this is "bed" in "..bedit.." case,
             * i.e. look ahead for all prioritized/longer keys starting within
             * the current match region; iff found, then ignore match ("bed")
             * and continue search (find "edit" later), else handle match. */
            // if (better-match-overlaps-right-ahead)
            //   continue;
            result.append(input, s, i).append(vals[p]);
            i += keys[p].length();
            s = i--;
          }
        }
      }
      if (s == lo) // no matches? no changes!
        return input;
      return result.append(input, s, hi).toString();
    }
    
        10
  •  0
  •   Community Egal    7 年前

    以下是基于 Todd Owen's answer . 该解决方案的问题是,如果替换内容包含在正则表达式中具有特殊意义的字符,则可能会得到意外的结果。我还希望能够选择性地进行不区分大小写的搜索。我想到的是:

    /**
     * Performs simultaneous search/replace of multiple strings. Case Sensitive!
     */
    public String replaceMultiple(String target, Map<String, String> replacements) {
      return replaceMultiple(target, replacements, true);
    }
    
    /**
     * Performs simultaneous search/replace of multiple strings.
     * 
     * @param target        string to perform replacements on.
     * @param replacements  map where key represents value to search for, and value represents replacem
     * @param caseSensitive whether or not the search is case-sensitive.
     * @return replaced string
     */
    public String replaceMultiple(String target, Map<String, String> replacements, boolean caseSensitive) {
      if(target == null || "".equals(target) || replacements == null || replacements.size() == 0)
        return target;
    
      //if we are doing case-insensitive replacements, we need to make the map case-insensitive--make a new map with all-lower-case keys
      if(!caseSensitive) {
        Map<String, String> altReplacements = new HashMap<String, String>(replacements.size());
        for(String key : replacements.keySet())
          altReplacements.put(key.toLowerCase(), replacements.get(key));
    
        replacements = altReplacements;
      }
    
      StringBuilder patternString = new StringBuilder();
      if(!caseSensitive)
        patternString.append("(?i)");
    
      patternString.append('(');
      boolean first = true;
      for(String key : replacements.keySet()) {
        if(first)
          first = false;
        else
          patternString.append('|');
    
        patternString.append(Pattern.quote(key));
      }
      patternString.append(')');
    
      Pattern pattern = Pattern.compile(patternString.toString());
      Matcher matcher = pattern.matcher(target);
    
      StringBuffer res = new StringBuffer();
      while(matcher.find()) {
        String match = matcher.group(1);
        if(!caseSensitive)
          match = match.toLowerCase();
        matcher.appendReplacement(res, replacements.get(match));
      }
      matcher.appendTail(res);
    
      return res.toString();
    }
    

    以下是我的单元测试用例:

    @Test
    public void replaceMultipleTest() {
      assertNull(ExtStringUtils.replaceMultiple(null, null));
      assertNull(ExtStringUtils.replaceMultiple(null, Collections.<String, String>emptyMap()));
      assertEquals("", ExtStringUtils.replaceMultiple("", null));
      assertEquals("", ExtStringUtils.replaceMultiple("", Collections.<String, String>emptyMap()));
    
      assertEquals("folks, we are not sane anymore. with me, i promise you, we will burn in flames", ExtStringUtils.replaceMultiple("folks, we are not winning anymore. with me, i promise you, we will win big league", makeMap("win big league", "burn in flames", "winning", "sane")));
    
      assertEquals("bcaacbbcaacb", ExtStringUtils.replaceMultiple("abccbaabccba", makeMap("a", "b", "b", "c", "c", "a")));
      assertEquals("bcaCBAbcCCBb", ExtStringUtils.replaceMultiple("abcCBAabCCBa", makeMap("a", "b", "b", "c", "c", "a")));
      assertEquals("bcaacbbcaacb", ExtStringUtils.replaceMultiple("abcCBAabCCBa", makeMap("a", "b", "b", "c", "c", "a"), false));
    
      assertEquals("c colon  backslash temp backslash  star  dot  star ", ExtStringUtils.replaceMultiple("c:\\temp\\*.*", makeMap(".", " dot ", ":", " colon ", "\\", " backslash ", "*", " star "), false));
    }
    
    private Map<String, String> makeMap(String ... vals) {
      Map<String, String> map = new HashMap<String, String>(vals.length / 2);
      for(int i = 1; i < vals.length; i+= 2)
        map.put(vals[i-1], vals[i]);
      return map;
    }