代码之家  ›  专栏  ›  技术社区  ›  Rivaldo Hater

使用python删除文件中的重复单词

  •  -2
  • Rivaldo Hater  · 技术社区  · 7 年前

    我有一个重复了多个单词的文本文件。 我需要每个单词只出现一次。

    import  codecs
    
     wordList = codecs.open('Arquivo.txt' , 'r')
     wordList2 = codecs.open('Arquivo2.txt', 'w')
    
    for x in range(len(wordList)) :
        for y in range(x + 1, len(wordList ) ):
            if wordList[x] == wordList[y]:
                wordList2.append(wordList[x] )
            for y in wordList2:
                wordList.remove(y)
    

    Erro公司

        wordList2 = codecs.open('File2.txt', 'w').readline()
    IOError: File not open for reading
    
    1 回复  |  直到 7 年前
        1
  •  0
  •   chowsai    7 年前

    也许你想试试这个。它将使 wordList 列表而不是文件对象。同样也可以使用wordList2。

    .strip()

    wordList =[line.strip() for line in codecs.open('File.txt' , 'r').readlines()]
    

    编辑:这是完整的代码,我希望它对你有用

    import  codecs
    
    wordList = [line.strip() for line in codecs.open('File.txt' , 'r').readlines()]
    wordList2 = [line.strip() for line in codecs.open('File2.txt', 'r').readlines()]
    for x in range(len(wordList)) :
        for y in range(x + 1, len(wordList ) ):
            if wordList[x] == wordList[y]:
                wordList2.append(wordList[x])
            for y in wordList2:
                wordList.remove(y)
    
    # assuming the code above is working
    # now write your updated contents
    with open('outfile1.txt','w') as outfile1:
        for word in wordList:
            outfile1.write(word + '\n')
    
    with open('outfile2.txt','w') as outfile2:
        for word in wordList2:
            outfile2.write(word + '\n')
    

    wordList = {line.strip():1 for line in codecs.open('File.txt' , 'r').readlines()}
    

    哪里 line.strip() 是你的钥匙和 1 wordList[word] = 0