代码之家 › 专栏 › 技术社区 › Ahmad Latif

如何在列表中拆分句子?

python

Ahmad Latif · 技术社区 · 4 年前

我试图创建一个函数来计算任何给定句子中的单词数量和单词的平均长度。我似乎无法将字符串拆分为两个句子放入列表中,假设句子有句号并结束句子。

问号和感叹号应替换为句号,以被识别为列表中的新句子。
例如: "Haven't you eaten 8 oranges today? I don't know if you did." 将是: ["Haven't you eaten 8 oranges today", "I don't know if you did"]
本例的平均长度为44/12=3.6

def word_length_list(text):
    text = text.replace('--',' ')

    for p in string.punctuation + "ââââ":
        text = text.replace(p,'')

    text = text.lower()
    words = text.split(".")
    word_length = []
    print(words)

    for i in words:
        count = 0
        for j in i:
            count = count + 1
        word_length.append(count)
    
    return(word_length)

testing1 = word_length_list("Haven't you eaten 8 oranges today? I don't know if you did.")
print(sum(testing1)/len(testing1))

1 回复 | 直到 4 年前

Tim Biegeleisen 4 年前

一个选项可能使用 re.split :

inp = "Haven't you eaten 8 oranges today? I don't know if you did."
sentences = re.split(r'(?<=[?.!])\s+', inp)
print(sentences)

此打印:

["Haven't you eaten 8 oranges today?", "I don't know if you did."]

我们也可以使用 re.findall :

inp = "Haven't you eaten 8 oranges today? I don't know if you did."
sentences = re.findall(r'.*?[?!.]', inp)
print(sentences)  # prints same as above

请注意,在这两种情况下,我们都假设 . 只会显示为一个停止,而不是缩写的一部分。若句号可以有多个上下文,那个么区分句子可能会很棘手。例如:

Jon L. Skeet earned more point than anyone.  Gordon Linoff also earned a lot of points.

这里不清楚句号是指句末还是缩写的一部分。

CryptoFool Sachithra Dilshan 4 年前

使用正则表达式进行拆分的示例:

import re
s = "Hello! How are you?"
print([x for x in re.split("[\.\?\!]+",s.strip()) if not x == ''])

推荐文章

July · 如何定义数字间隔,然后四舍五入

1 年前

Community wiki · 对象名称前的单下划线和双下划线的含义是什么?

1 年前

Brian Johnson · 为什么在Python中列出字典列表会引发TypeError?[已关闭]

1 年前

user026 · 如何根据特定窗口的平均值(行数)创建新列?

1 年前

Ashok Shrestha · 需要追踪特定的颜色线并获取坐标

1 年前

Nicote Ool · 在FastApi和Vue3中获得422

1 年前

NeoExceptCodeBad · 如果我有很多垂直线,我如何找到它们的边缘?

1 年前

Abdulaziz · 如何对集合内的列表进行排序[重复]

1 年前

user2743931 · 带有src目录的Python setup.py

1 年前

asmgx · 为什么合并数据帧不能按照python中的预期方式工作

1 年前