代码之家  ›  专栏  ›  技术社区  ›  Ahmad Latif

如何在列表中拆分句子?

  •  0
  • Ahmad Latif  · 技术社区  · 4 年前

    我试图创建一个函数来计算任何给定句子中的单词数量和单词的平均长度。我似乎无法将字符串拆分为两个句子放入列表中,假设句子有句号并结束句子。

    • 问号和感叹号应替换为句号,以被识别为列表中的新句子。
    • 例如: "Haven't you eaten 8 oranges today? I don't know if you did." 将是: ["Haven't you eaten 8 oranges today", "I don't know if you did"]
    • 本例的平均长度为44/12=3.6
    def word_length_list(text):
        text = text.replace('--',' ')
    
        for p in string.punctuation + "‘’”“":
            text = text.replace(p,'')
    
        text = text.lower()
        words = text.split(".")
        word_length = []
        print(words)
    
        for i in words:
            count = 0
            for j in i:
                count = count + 1
            word_length.append(count)
        
        return(word_length)
    
    testing1 = word_length_list("Haven't you eaten 8 oranges today? I don't know if you did.")
    print(sum(testing1)/len(testing1))
    
    
    1 回复  |  直到 4 年前
        1
  •  1
  •   Tim Biegeleisen    4 年前

    一个选项可能使用 re.split :

    inp = "Haven't you eaten 8 oranges today? I don't know if you did."
    sentences = re.split(r'(?<=[?.!])\s+', inp)
    print(sentences)
    

    此打印:

    ["Haven't you eaten 8 oranges today?", "I don't know if you did."]
    

    我们也可以使用 re.findall :

    inp = "Haven't you eaten 8 oranges today? I don't know if you did."
    sentences = re.findall(r'.*?[?!.]', inp)
    print(sentences)  # prints same as above
    

    请注意,在这两种情况下,我们都假设 . 只会显示为一个停止,而不是缩写的一部分。若句号可以有多个上下文,那个么区分句子可能会很棘手。例如:

    Jon L. Skeet earned more point than anyone.  Gordon Linoff also earned a lot of points.
    

    这里不清楚句号是指句末还是缩写的一部分。

        2
  •  0
  •   CryptoFool Sachithra Dilshan    4 年前

    使用正则表达式进行拆分的示例:

    import re
    s = "Hello! How are you?"
    print([x for x in re.split("[\.\?\!]+",s.strip()) if not x == ''])