代码之家 › 专栏 › 技术社区 › tungd

保留注释并从错误中恢复的解析器

parsec pyparsing parsing python

tungd · 技术社区 · 7 年前

我已经完成了解析-编辑-写入部分,除了:

解析后的数据结构仅包含对象属性信息,因此注释和空白在写入时丢失

https://pythonhosted.org/parsec/documentation.html 但是,我们非常感谢您的帮助和总体指导。

https://pythonhosted.org/pylens/

2 回复 | 直到 7 年前

Dan Svoboda 7 年前

您询问了解决此问题的典型方法。以下是两个项目,它们解决了与您描述的项目类似的挑战:

sketch-n-sketch

Boomerang :使用镜头“聚焦”某些具体语法的抽象含义,更改抽象模型,然后在原始源代码中反映这些更改。

这两个项目都产生了几篇论文,描述了作者采取的方法。据我所知,透镜方法很流行,解析和打印成为 get 和 put 镜头的功能,它需要一些源代码,并侧重于抽象概念的代码描述。

tungd 7 年前

Text 结构代码是可重用的,除了您必须将其合并到所有具有重复结果的地方之外,原始解析器可能会失败。

这是代码,以防对任何人都有帮助。它是为 Parsy

class Text(object):
    '''Structure to contain all the parts that the parser does not understand.
    A better name would be Whitespace
    '''
    def __init__(self, text=''):
        self.text = text

    def __repr__(self):
        return "Text(text='{}')".format(self.text)

    def __eq__(self, other):
        return self.text.strip() == getattr(other, 'text', '').strip()


def many_skip_error(parser, skip=lambda t, i: i + 1, until=None):
    '''Repeat the original `parser`, aggregate result into `values` 
    and error in `Text`.
    '''
    @Parser
    def _parser(stream, index):
        values, result = [], None

        while index < len(stream):
            result = parser(stream, index)
            # Original parser success
            if result.status:
                values.append(result.value)
                index = result.index
            # Check for end condition, effectively `manyTill` in Parsec
            elif until is not None and until(stream, index).status:
                break
            # Aggregate skipped text into last `Text` value, or create a new one
            else:
                if len(values) > 0 and isinstance(values[-1], Text):
                    values[-1].text += stream[index]
                else:
                    values.append(Text(stream[index]))
                index = skip(stream, index)
        return Result.success(index, values).aggregate(result)
    return _parser


# Example usage
skip_error_parser = many_skip_error(original_parser)

另一方面,我想这里真正的问题是我使用了一个解析器组合器库,而不是一个适当的两阶段解析过程。在传统解析中,标记器将处理保留/跳过任何空格/注释/语法错误,使它们都有效地保留空格,并且对解析器不可见。

推荐文章

batman · 如何用特定模式grep特定行及其子网行?

2 年前

user19251203 · ReactJs:Uncaught TypeError:无法读取未定义的属性(读取“0”)

3 年前

Jensen Holm · 在非常大的字符串中查找链接时遇到问题

3 年前

MBF · PHP导入/解析XML文件内容保存到数据库

3 年前

John Bustos · Javascript——基于字典/对象中的键解析字符串

3 年前

user10717742 · 用java编写的自定义文件解析器需要改进

3 年前

Taj Harris · JSONArray文本必须以“[”开头,位于1[字符2第1行]:需要帮助解析Json

3 年前

Muhsin Muhammed · 向文件中的行添加引号和逗号

3 年前

Felkru · 添加字符串会在Javascript中返回空字符串

3 年前

Mustard Tiger · 熊猫解析文本列

3 年前