给定从PDF中提取的示例文本:
Professional Learning - August 31 Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24 - January 4 Martin Luther King, Jr. Day - January 21 Presidentsâ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19
我的目标是捕获所有的月和日,即它应该捕获以下所有内容:
-
August 31
-
October 19
-
March 18-22
-
December 24 - January 4
-
December 24-January 4
困难的部分是捕捉月份不一样的范围。我想出了这个规则:
/(January|February|March|April|May|August|September|October|November|December)\\s([0-9]*-?[0-9]+)(\s*-\s*(January|February|March|April|May|August|September|October|November|December)\\s([0-9]*-?[0-9]+))?/g
除了上面列出的最后两个例子之外,它对所有人都很有用。打开
regexr
,它显示它在捕获组3中捕获它很好,但我不能在javascript中访问它。以这个片段为例:
const string = 'Professional Learning - August 31 Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24 - January 4 Martin Luther King, Jr. Day - January 21 Presidentsâ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19';
const subRegex = '(January|February|March|April|May|August|September|October|November|December)\\s([0-9]*-?[0-9]+)';
const dateRegex = new RegExp(`${subRegex}(\s*-\s*${subRegex})?`, 'g');
console.log(string.match(dateRegex));
好像我能抓到
December 24
和
January 4
分开,但不在一起。有没有办法把他们抓在一起?