我想你在找
group by
+
having
:
proc sql;
create table out as
select
compged(fuzzy.fuzzy_title,clean.cleaned_title,100)as comp
,fuzzy.fuzzy_title
,clean.cleaned_title
from fuzzy inner join clean
on (compged(fuzzy.fuzzy_title,clean.cleaned_title,100) < 100
and spedis(clean.cleaned_title,fuzzy.fuzzy_title) < 50)
group by fuzzy.fuzzy_title
having calculated comp = min(compged(fuzzy.fuzzy_title,clean.cleaned_title,100))
;quit;
如果还有更多
fuzzy_title
+
cleaned_title
comp
值,所有这些值都将在输出中。在单个查询中只能选择其中一个。然而,我认为更容易将这些步骤分开,并为每个步骤选择一行
在另一个查询中(例如使用
first
数据步长变量)。