代码之家  ›  专栏  ›  技术社区  ›  Andrew Li

如何使用javascript regexp捕获特定组?

  •  0
  • Andrew Li  · 技术社区  · 6 年前

    给定从PDF中提取的示例文本:

    Professional Learning - August 31  Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24 - January 4 Martin Luther King, Jr. Day - January 21 Presidents’ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19
    

    我的目标是捕获所有的月和日,即它应该捕获以下所有内容:

    • August 31
    • October 19
    • March 18-22
    • December 24 - January 4
    • December 24-January 4

    困难的部分是捕捉月份不一样的范围。我想出了这个规则:

    /(January|February|March|April|May|August|September|October|November|December)\\s([0-9]*-?[0-9]+)(\s*-\s*(January|February|March|April|May|August|September|October|November|December)\\s([0-9]*-?[0-9]+))?/g
    

    除了上面列出的最后两个例子之外,它对所有人都很有用。打开 regexr ,它显示它在捕获组3中捕获它很好,但我不能在javascript中访问它。以这个片段为例:

    const string = 'Professional Learning - August 31  Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24 - January 4 Martin Luther King, Jr. Day - January 21 Presidents’ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19';
    
    const subRegex = '(January|February|March|April|May|August|September|October|November|December)\\s([0-9]*-?[0-9]+)';
    const dateRegex = new RegExp(`${subRegex}(\s*-\s*${subRegex})?`, 'g');
    
    console.log(string.match(dateRegex));

    好像我能抓到 December 24 January 4 分开,但不在一起。有没有办法把他们抓在一起?

    1 回复  |  直到 6 年前
        1
  •  1
  •   CertainPerformance    6 年前

    你只需要调整(也许简化)你原来的RE一点:

    const str = 'Professional Learning - August 31  Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24 - January 4 Martin Luther King, Jr. Day - January 21 Presidents’ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19';
    // str2 has "December 24-January 4" instead - no spaces
    const str2 = 'Professional Learning - August 31  Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24-January 4 Martin Luther King, Jr. Day - January 21 Presidents’ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19';
    const re = /(January|February|March|April|May|August|September|October|November|December) [\d-]+([ -]*(January|February|March|April|May|August|September|October|November|December) \d+)?/g;
    console.log(str.match(re));
    console.log(str2.match(re));