尝试
Treetop
. 它是一个类似于红宝石的DSL来描述语法。解析您给出的字符串应该非常容易,并且通过使用真正的解析器,您可以很容易地在以后扩展语法。
要分析的字符串类型的示例语法(另存为
sentences.treetop
):
grammar Sentences
rule sentence
# A sentence is a combination of one or more expressions.
expression* <Sentence>
end
rule expression
# An expression is either a literal or a parenthesised expression.
parenthesised / literal
end
rule parenthesised
# A parenthesised expression contains one or more sentences.
"(" (multiple / sentence) ")" <Parenthesised>
end
rule multiple
# Multiple sentences are delimited by a pipe.
sentence "|" (multiple / sentence) <Multiple>
end
rule literal
# A literal string contains of word characters (a-z) and/or spaces.
# Expand the character class to allow other characters too.
[a-zA-Z ]+ <Literal>
end
end
上面的语法需要一个附带的文件,该文件定义允许我们访问节点值的类(另存为
sentence_nodes.rb
)
class Sentence < Treetop::Runtime::SyntaxNode
def combine(a, b)
return b if a.empty?
a.inject([]) do |values, val_a|
values + b.collect { |val_b| val_a + val_b }
end
end
def values
elements.inject([]) do |values, element|
combine(values, element.values)
end
end
end
class Parenthesised < Treetop::Runtime::SyntaxNode
def values
elements[1].values
end
end
class Multiple < Treetop::Runtime::SyntaxNode
def values
elements[0].values + elements[2].values
end
end
class Literal < Treetop::Runtime::SyntaxNode
def values
[text_value]
end
end
下面的示例程序显示,解析您给出的示例语句非常简单。
require "rubygems"
require "treetop"
require "sentence_nodes"
str = 'maybe (this is|that was) some' +
' ((nice|ugly) (day|night)|(strange (weather|time)))'
Treetop.load "sentences"
if sentence = SentencesParser.new.parse(str)
puts sentence.values
else
puts "Parse error"
end
这个程序的输出是:
maybe this is some nice day
maybe this is some nice night
maybe this is some ugly day
maybe this is some ugly night
maybe this is some strange weather
maybe this is some strange time
maybe that was some nice day
maybe that was some nice night
maybe that was some ugly day
maybe that was some ugly night
maybe that was some strange weather
maybe that was some strange time
您还可以访问语法树:
p sentence
The output is here
.
有了它:一个可伸缩的解析解决方案,它应该非常接近您在大约50行代码中想要做的事情。有帮助吗?