代码之家  ›  专栏  ›  技术社区  ›  falcoso

在python中处理包含标点符号编码的数据文件中的字符串

  •  2
  • falcoso  · 技术社区  · 6 年前

    我正在尝试制作一个简单的程序,可以帮助制作一个流行桌面战游戏的军队列表。更多的是作为我自己经验的练习,因为有很多预先制作的软件包可以做到这一点,但背后的想法似乎相当简单。

    该程序从电子表格中读取陆军所有可用部队的数据,并为每个部队创建不同的类别。我现在主要关注的是选项/升级。

    在这个文件中,我想为每个单元的选项字段提供一个简单的语法。即以下选项字符串 itemA, itemB/itemC-3, 2*itemD, itemE/itemF/itemG, itemH/itemI+itemJ 意味着

        1. you may take itemA (X pts per model)
        2. for every 3 models, you may exchange itemB with 
             a) itemC (net X pts per model)
        3. each model may take 2 of itemD (X pts per model)
        4. each model may take one of either 
             a)itemE (X pts per model)
             b)itemF (X pts per model)
             c)itemG (X pts per model
        5. each model may take either 
             a)itemH (X points per model)
             b)itemI and itemJ (X points per model)
    

    目前,我正在使用大量的split和if语句处理字符串,这使得在用户输入他们的选择后很难跟踪和正确分配字符串。

        for index, option in enumerate(self.options):
            output = "{}.".format(index+1)
            if '-' in option:
                sub_option, no_models = option.split('-')
                no_models = int(no_models)
                print(sub_option)
                print(no_models)
                output += "For every {} models ".format(no_models)
                if '/' in sub_option:
                    temp_str, temp_options, points_list = exchange_option(sub_option)
    
                else:
                    temp_str, temp_options, points_list = standard_option(sub_option)
    
                index_points.append(points_list)
                temp_options.append(no_models)
                index_options.append(temp_options)
    
            else:
                if '/' in option:
                    temp_str, temp_options, points_list = exchange_option(option)
                else:
                    temp_str, temp_options, points_list = standard_option(option)
    
                index_points.append(points_list)
                index_options.append(temp_options)
    
            output += temp_str
    

    这个 *_option() 函数是我在上面定义的附加助手函数,它具有类似的结构,其中包含更多的if语句。

    我要问的主要问题是,有没有一种更简单的方法来处理像这样的代码字符串?在上面的示例中,虽然它可以产生输出,但是处理用户输入似乎非常麻烦。

    我的目标是首先在问题的顶部输出我的示例中给出的字符串,然后使用给定选项的用户输入索引,修改关联的Unit类以获得正确的wargear和points值。

    我曾考虑过尝试创建一些选项类,但再次标记和定义每个选项,以便它们能够正确地相互作用,这似乎同样复杂,我觉得必须有更为蟒蛇式的东西,或只是一般更好的编码实践来处理编码的字符串,如这?

    1 回复  |  直到 6 年前
        1
  •  1
  •   ForceBru    6 年前


    Lexer.tokenize Token

    # FILE: lex.py
    
    import re
    import enum
    
    class Token:
        def __init__(self, type, value: str, lineno: int, pos: int):
            self.type, self.value, self.lineno, self.pos = type, value, lineno, pos
    
        def __str__(self):
            v = f'({self.value!r})' if self.value else ''
    
            return f'{self.type.name}{v} at {self.lineno}:{self.pos}'
    
        __repr__ = __str__
    
    
    class Lexer:
        def __init__(self, token_types: enum.Enum, tokens_regexes: dict):
            self.token_types = token_types
    
            regex = '|'.join(map('(?P<{}>{})'.format, *zip(*((tok.name, regex) for tok, regex in tokens_regexes.items()))))
            self.regex = re.compile(regex)
    
    
        def tokenize(self, string, skip=['space']):
            # TODO: detect invalid input
    
            lineno, pos = 0, 0
            skip = set(map(self.token_types.__getitem__, skip))
    
            for matchobj in self.regex.finditer(string):
                type_name = matchobj.lastgroup
                value = matchobj.groupdict()[type_name]
    
                Type = self.token_types[type_name]
    
                if Type == self.token_types.newline: # possibly buggy, but not catastrophic
                    self.lineno += 1
                    self.pos = 0
                    continue
    
                pos = matchobj.end()
    
                if Type not in skip:
                    yield Token(Type, value, lineno, pos)   
    
            yield Token(self.token_types.EOF, '', lineno, pos)
    

    lex.Lexer.tokenize

    Opt_list -> Option Opt_list_
    Opt_list_ -> comma Option Opt_list_ | empty
    Option -> Choice | Mult
    Choice -> Compound More_choices Exchange
    Compound -> item Add_item
    Add_item -> plus item Add_item | empty
    More_choices -> slash Compound More_choices | empty
    Exchange -> minus num | empty
    Mult -> num star Compound
    

    EOF

    vital statistics

    parse_<something> Parser.parse

    # FILE: parse.py
    
    import lex
    
    class Parser:
    
        def __init__(self, lexer):
            self.string, self.tokens = None, None
            self.lexer = lexer
            self.t = self.lexer.token_types
    
            self.__lookahead = None
    
        @property
        def lookahead(self):
            if not self.__lookahead:
                try:
                    self.__lookahead = next(self.tokens)
                except StopIteration:
                    self.__lookahead = lex.Token(self.t.EOF, '', 0, -1)
    
            return self.__lookahead
    
        def next(self):
            if self.__lookahead and self.__lookahead.type == self.t.EOF:
                return self.__lookahead
    
            self.__lookahead = None
            return self.lookahead
    
        def match(self, token_type):
            if self.lookahead.type == token_type:
                return self.next()
    
            raise SyntaxError(f'Expected {token_type}, got {self.lookahead.type}', ('<string>', self.lookahead.lineno, self.lookahead.pos, self.string))
    
        # THE PARSING STARTS HERE
        def parse(self, string):
            # setup
            self.string = string
            self.tokens = self.lexer.tokenize(string)
            self.__lookahead = None
            self.next()
    
            # do parsing
            ret = [''] + self.parse_opt_list()
    
            return ' '.join(ret)
    
        def parse_opt_list(self) -> list:
            ret = self.parse_option(1)
            ret.extend(self.parse_opt_list_(1))
    
            return ret
    
        def parse_opt_list_(self, curr_opt_number) -> list:
            if self.lookahead.type in {self.t.EOF}:
                return []
    
            self.match(self.t.comma)
    
            ret = self.parse_option(curr_opt_number + 1)
            ret.extend(self.parse_opt_list_(curr_opt_number + 1))
    
            return ret
    
        def parse_option(self, opt_number) -> list:
            ret = [f'{opt_number}.']
    
            if self.lookahead.type == self.t.item:
                ret.extend(self.parse_choice())
            elif self.lookahead.type == self.t.num:
                ret.extend(self.parse_mult())
            else:
                raise SyntaxError(f'Expected {token_type}, got {self.lookahead.type}', ('<string>', self.lookahead.lineno, self.lookahead.pos, self.string))
    
            ret[-1] += '\n'
    
            return ret
    
        def parse_choice(self) -> list:
            c = self.parse_compound()
            m = self.parse_more_choices()
            e = self.parse_exchange()
    
            if not m:
                if not e:
                    ret = f'You may take {" ".join(c)}'
                else:
                    ret = f'for every {e} models you may take item {" ".join(c)}'
            elif m:
                c.extend(m)
    
                if not e:
                    ret = f'each model may take one of: {", ".join(c)}'
                else:
                    ret = f'for every {e} models you may exchange the following items with each other: {", ".join(c)}'
            else:
                ret = 'Semantic error!'
    
            return [ret]
    
    
        def parse_compound(self) -> list:
            ret = [self.lookahead.value]
    
            self.match(self.t.item)
            _ret = self.parse_add_item()
    
            return [' '.join(ret + _ret)]
    
        def parse_add_item(self) -> list:
            if self.lookahead.type in {self.t.comma, self.t.minus, self.t.slash, self.t.EOF}:
                return []
    
            ret = ['with']   
            self.match(self.t.plus)
    
            ret.append(self.lookahead.value)
            self.match(self.t.item)
    
            return ret + self.parse_add_item()
    
    
        def parse_more_choices(self) -> list:
            if self.lookahead.type in {self.t.comma, self.t.minus, self.t.EOF}:
                return []
    
            self.match(self.t.slash)
            ret = self.parse_compound()
    
            return ret + self.parse_more_choices()
    
    
        def parse_exchange(self) -> str:
            if self.lookahead.type in {self.t.comma, self.t.EOF}:
                return ''
    
            self.match(self.t.minus)
    
            ret = self.lookahead.value
            self.match(self.t.num)
    
            return ret
    
        def parse_mult(self) -> list:
            ret = [f'each model may take {self.lookahead.value} of:']
    
            self.match(self.t.num)
            self.match(self.t.star)
    
            return ret + self.parse_compound()
    

    # FILE: evaluate.py
    
    import enum
    
    from lex import Lexer
    from parse import Parser
    
    
    # these are all the types of tokens present in our grammar
    token_types = enum.Enum('Types', 'item num plus minus star slash comma space newline empty EOF')
    
    t = token_types
    
    # these are the regexes that the lexer uses to recognise the tokens
    terminals_regexes = {
        t.item: r'[a-zA-Z_]\w*',
        t.num: '0|[1-9][0-9]*',
        t.plus: r'\+',
        t.minus: '-',
        t.star: r'\*',
        t.slash: '/',
        t.comma: ',',
        t.space: r'[ \t]',
        t.newline: r'\n'
    }
    
    lexer = Lexer(token_types, terminals_regexes)
    parser = Parser(lexer)
    
    string = 'itemA, itemB/itemC-3, 2*itemD, itemE/itemF/itemG, itemH/itemI+itemJ'
    print(f'STRING FROM THE QUESTION: {string!r}\nRESULT:')
    print(parser.parse(string), '\n\n')
    
    
    string = input('Enter a command: ')
    
    while string and string.lower() not in {'q', 'quit', 'e', 'exit'}:
        try:
            print(parser.parse(string))
        except SyntaxError as e:
            print(f'    Syntax error: {e}\n    {e.text}\n' + ' ' * (4 + e.offset - 1) + '^\n')
    
        string = input('Enter a command: ')
    

    # python3 evaluate.py
    
    STRING FROM THE QUESTION: 'itemA, itemB/itemC-3, 2*itemD, itemE/itemF/itemG, itemH/itemI+itemJ'
    RESULT:
     1. You may take itemA
     2. for every 3 models you may exchange the following items with each other: itemB, itemC
     3. each model may take 2 of: itemD
     4. each model may take one of: itemE, itemF, itemG
     5. each model may take one of: itemH, itemI with itemJ
    
    
    
    Enter a command: itemA/b/c/stuff
     1. each model may take one of: itemA, b, c, stuff
    
    Enter a command: 4 * anything
     1. each model may take 4 of: anything
    
    Enter a command: 5 * anything + more
     1. each model may take 5 of: anything with more
    
    Enter a command: a + b + c+ d
     1. You may take a with b with c with d
    
    Enter a command: a+b/c
     1. each model may take one of: a with b, c
    
    Enter a command: itemA/itemB-2
     1. for every 2 models you may exchange the following items with each other: itemA, itemB
    
    Enter a command: itemA+itemB/itemC - 5
     1. for every 5 models you may exchange the following items with each other: itemA with itemB, itemC
    
    Enter a command: q