代码之家 › 专栏 › 技术社区 › Panama Jack

在全文搜索中统计ts_stat Count的唯一项

postgresql sql

Panama Jack · 技术社区 · 5 年前

我正在努力使用ts_stat获取表中标记的唯一出现次数,并按最高计数对它们进行排序。不过,我需要的是只计算每个条目一次,以便只计算唯一的条目。我试着分组,但没有什么对我有用。

e、 g.桌子

user_id | tags         | post_date
===================================
2       | dog cat      | 1580049400
2       | dog          | 1580039400
3       | dog          | 1580038400
3       | dog dog cat  | 1580058400
4       | dog horse    | 1580028400

SELECT word, ndoc, nentry
FROM   ts_stat($$SELECT to_tsvector('simple', tags) FROM tags WHERE post_date > 1580018400$$) 
ORDER  BY ndoc DESC
LIMIT  10;

现在这将产生

word | ndoc | nentry
====================
dog  | 5    | 6
cat  | 2    | 2
horse| 1    | 1

我将寻找的结果是唯一计数,因此没有一个用户可以计数超过一次,即使他们在某个日期之后有一个条目,如post_date条件中所述(这可能不相关)。就像下面。

word | total_count_per_user
===========================
dog  | 3    (because there are 3 unique users with this term)
cat  | 2    (because there are 2 unique users with this term)
horse| 1    (because there are 1 unique users with this term)

:我更改了列名以反映输出。关键是不管用户输入了多少次单词。它只需要每个用户的唯一计数。e、 g.如果在这种情况下,用户在文本中用dog创建100个条目,则该用户只计算dog 1次,而不是100个dog。

0 回复 | 直到 5 年前

mkRabbani 5 年前

如果我把你的观点弄对了,你可以使用不同的数值。示例查询如下-

SELECT tags,COUNT(DISTINCT user_id)
FROM your_table
GROUP BY tags

Panama Jack 5 年前

更新 :这在不使用CTE的情况下有效。交叉连接也是过滤用户id的关键。

SELECT DISTINCT (t.word) as tag, count(DISTINCT h.user_id) as posts 
FROM ts_stat($$SELECT hashtagsearch FROM tagstable WHERE post_date > 1580018400$$) t 
CROSS JOIN tagstable h WHERE hashtagsearch @@ to_tsquery('simple',t.word)
GROUP BY t.word HAVING count(DISTINCT h.user_id) > 1 ORDER BY posts DESC LIMIT 10'

https://stackoverflow.com/a/42704207/330987