我们有一个看起来像这样的源文件(“source-a”)(如果您看到蓝色文本,它来自stackoverflow,而不是文本文件):
The container of white spirit was made of aluminium.
We will use an aromatic method to analyse properties of white spirit.
No one drank white spirit at stag night.
Many people think that a potato crisp is savoury, but some would rather eat mashed potato.
...
more sentences
“source-a”中的每个句子都在自己的行上,以换行符结尾(\n)
我们有一个字典/转换文件(“converse-b”),如下所示:
aluminium<tab>aluminum
analyse<tab>analyze
white spirit<tab>mineral spirits
stag night<tab>bachelor party
savoury<tab>savory
potato crisp<tab>potato chip
mashed potato<tab>mashed potatoes
“converse-b”是一个以制表符分隔的两列文件。
每个等价映射(
左项
<tab>
权利条款
)在自己的行上,并以换行符(\n)终止
如何读取“converse-b”,并替换“source-a”中的术语,其中“converse-b”列-1中的术语替换为列-2中的术语,然后写入输出文件(“output-c”)?
例如,“output-c”如下所示:
The container of mineral spirits was made of aluminum.
We will use an aromatic method to analyze properties of mineral spirits.
No one drank mineral spirits at bachelor party.
Many people think that a potato chip is savory, but some would rather eat mashed potatoes.
棘手的部分是土豆这个词。
如果“简单”
awk
解不能处理奇异项(土豆)
和
一个复数术语(土豆),我们将使用手动替换方法。这个
AWK
解决方案可以跳过这个用例。
换句话说,一个
AWK
解决方案可以规定它只适用于无歧义词或由空格分隔的无歧义词组成的词。
安
AWK
解决方案将使我们达到90%的完成率;剩下的10%将手动完成。