代码之家  ›  专栏  ›  技术社区  ›  astropanic

Ruby分析字符串

  •  5
  • astropanic  · 技术社区  · 14 年前

    我有根绳子

    input = "maybe (this is | that was) some ((nice | ugly) (day |night) | (strange (weather | time)))"
    

    Ruby中解析这个字符串的最佳方法是什么?

    我的意思是脚本应该能够构建这样的sententes:

    也许这是个难看的夜晚

    也许那是个美好的夜晚

    也许这是个奇怪的时刻

    等等,你明白了…

    我是应该逐字符读取字符串,还是应该使用一个具有堆栈的状态机来存储括号值以供以后计算,还是有更好的方法?

    也许是一个现成的,开箱即用的图书馆?

    1 回复  |  直到 14 年前
        1
  •  8
  •   molf    14 年前

    尝试 Treetop . 它是一个类似于红宝石的DSL来描述语法。解析您给出的字符串应该非常容易,并且通过使用真正的解析器,您可以很容易地在以后扩展语法。

    要分析的字符串类型的示例语法(另存为 sentences.treetop ):

    grammar Sentences
      rule sentence
        # A sentence is a combination of one or more expressions.
        expression* <Sentence>
      end
    
      rule expression
        # An expression is either a literal or a parenthesised expression.
        parenthesised / literal
      end
    
      rule parenthesised
        # A parenthesised expression contains one or more sentences.
        "(" (multiple / sentence) ")" <Parenthesised>
      end
    
      rule multiple
        # Multiple sentences are delimited by a pipe.
        sentence "|" (multiple / sentence) <Multiple>
      end
    
      rule literal
        # A literal string contains of word characters (a-z) and/or spaces.
        # Expand the character class to allow other characters too.
        [a-zA-Z ]+ <Literal>
      end
    end
    

    上面的语法需要一个附带的文件,该文件定义允许我们访问节点值的类(另存为 sentence_nodes.rb )

    class Sentence < Treetop::Runtime::SyntaxNode
      def combine(a, b)
        return b if a.empty?
        a.inject([]) do |values, val_a|
          values + b.collect { |val_b| val_a + val_b }
        end
      end
    
      def values
        elements.inject([]) do |values, element|
          combine(values, element.values)
        end
      end
    end
    
    class Parenthesised < Treetop::Runtime::SyntaxNode
      def values
        elements[1].values
      end
    end
    
    class Multiple < Treetop::Runtime::SyntaxNode
      def values
        elements[0].values + elements[2].values
      end
    end
    
    class Literal < Treetop::Runtime::SyntaxNode
      def values
        [text_value]
      end
    end
    

    下面的示例程序显示,解析您给出的示例语句非常简单。

    require "rubygems"
    require "treetop"
    require "sentence_nodes"
    
    str = 'maybe (this is|that was) some' +
      ' ((nice|ugly) (day|night)|(strange (weather|time)))'
    
    Treetop.load "sentences"
    if sentence = SentencesParser.new.parse(str)
      puts sentence.values
    else
      puts "Parse error"
    end
    

    这个程序的输出是:

    maybe this is some nice day
    maybe this is some nice night
    maybe this is some ugly day
    maybe this is some ugly night
    maybe this is some strange weather
    maybe this is some strange time
    maybe that was some nice day
    maybe that was some nice night
    maybe that was some ugly day
    maybe that was some ugly night
    maybe that was some strange weather
    maybe that was some strange time
    

    您还可以访问语法树:

    p sentence
    

    The output is here .

    有了它:一个可伸缩的解析解决方案,它应该非常接近您在大约50行代码中想要做的事情。有帮助吗?