代码之家 › 专栏 › 技术社区 › Wolgan Ens

基于函数统计python列表重复项

duplicates dictionary list string python

Wolgan Ens · 技术社区 · 6 年前

我有一系列字符串(参考书目)供许多研究人员参考(一个带有姓名和参考书目的dict)。现在我需要得到每个字符串(参考)出现在配置文件上的次数,但字符串不需要完全相等,我有一个函数,它根据Levenshtein算法确定两个字符串的相似性。我实际上可以做到这一点,但我正在寻找更有效的方法,因为该函数占用了总执行时间的很大一部分:

def get_references_with_count(self, references):
    ref_len = len(references)
    references_with_count = dict()
    # ReferÃªncias repetidas     
    for i in range(ref_len):
        references_with_count[references[i]] = 1
        for j in range(ref_len):
            if i != j and ratio(references[i], references[j]) >= 0.75:
                references_with_count[references[i]] += 1

    return references_with_count

如你所见:

1116978 function calls (1116108 primitive calls) in 28.169 seconds
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2073    0.024    0.000    0.159    0.000 Extractor.py:44(get_keywords)
     8292    0.020    0.000    0.024    0.000 Extractor.py:56(<listcomp>)
     2073    0.051    0.000    0.149    0.000 Extractor.py:62(get_references)
       98    0.000    0.000    0.000    0.000 Recommender.py:109(<listcomp>)
       98    0.000    0.000    0.000    0.000 Recommender.py:110(<listcomp>)
       99    0.112    0.001   17.301    0.175 y:129(get_references_with_count)

函数的references参数可以是如下列表:

['ABADI, M. et al. TensorFlow: A System for Large-Scale Machine Learning TensorFlow: A system for largescale machine learning. In: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '16) 2016, Anais... [s.l: s.n.]',
'ALADREN, A. et al. Navigation Assistance for the Visually Impaired Using RGB-D Sensor With Range Expansion. IEEE Systems Journal, [s. l.], v. 10, n. 3, p. 922-932, 2016. DisponÃvel em: <http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6819807>. Acesso em: 31 maio. 2018.',
'BASHIRI, F. S. et al. MCIndoor20000: A fully-labeled image dataset to advance indoor objects detection. Data in Brief, [s. l.], v. 17, p. 71-75, 2018. DisponÃvel em: <https://www.sciencedirect.com/science/article/pii/S2352340917307424>. Acesso em: 31 maio. 2018.',
'CHOLLET, F.; OTHERS. Keras: The Python Deep Learning library, 2015.',
'CONTINUUM ANALYTICS, I. Anaconda: Continuum Analystics. 2016.',
'DAVIES, E. R.; DAVIES, E. R. Deep-learning networks. In: Computer Vision. [s.l.] : Elsevier, 2018. p. 453-493.',
'DING, X. et al. Indoor object recognition using pre-trained convolutional neural network. In: 2017 23RD INTERNATIONAL CONFERENCE ON AUTOMATION AND COMPUTING (ICAC) 2017, Anais... : IEEE, 2017. DisponÃvel em: <http://ieeexplore.ieee.org/document/8081986/>. Acesso em: 31 maio. 2018.',
'GARCIA-GARCIA, A. et al. A survey on deep learning techniques for image and video semantic segmentation. Applied Soft Computing, [s. l.], v. 70, p. 41-65, 2018. DisponÃvel em: <https://www.sciencedirect.com/science/article/pii/S1568494618302813>. Acesso em: 31 maio. 2018.',
'GHARANI, P.; KARIMI, H. A. Context-aware obstacle detection for navigation by visually impaired. Image and Vision Computing, [s. l.], v. 64, p. 103-115, 2017. DisponÃvel em: <https://www.sciencedirect.com/science/article/pii/S0262885617300987>. Acesso em: 31 maio. 2018.',
'PEDREGOSA, F. et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, [s. l.], 2012.',
'Python. [s.d.]. DisponÃvel em: <https://www.python.org/>. Acesso em: 31 maio. 2018.',
'pyttsx. [s.d.]. DisponÃvel em: <https://pypi.org/project/pyttsx/>. Acesso em: 1 jun. 2018.',
'SUK, H.-I. An Introduction to Neural Networks and Deep Learning. In: Deep Learning for Medical Image Analysis. [s.l.] : Elsevier, 2017. p. 3-24.',
'TAPU, R. et al. A Smartphone-Based Obstacle Detection and Classification System for Assisting Visually Impaired People. In: 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS 2013, Anais... : IEEE, 2013. DisponÃvel em: <http://ieeexplore.ieee.org/document/6755931/>. Acesso em: 31 maio. 2018.',
'TAPU, R.; MOCANU, B.; ZAHARIA, T. A computer vision system that ensure the autonomous navigation of blind people. In: 2013 E-HEALTH AND BIOENGINEERING CONFERENCE (EHB) 2013, Anais... : IEEE, 2013. DisponÃvel em: <http://ieeexplore.ieee.org/document/6707267/>. Acesso em: 31 maio. 2018.',
'THEODORIDIS, S.; THEODORIDIS, S. Neural Networks and Deep Learning. In: Machine Learning. [s.l.] : Elsevier, 2015. p. 875-936.',
'VAN DER WALT, S. et al. scikit-image: image processing in Python. PeerJ, [s. l.], v. 2, p. e453, 2014. DisponÃvel em: <https://peerj.com/articles/453>. Acesso em: 7 fev. 2018.']

有时,即使它们不是相等的字符串,也会有相同作品(论文等)的引用,我需要根据比率函数计算引用字符串出现在数组上的次数,即使它不是相同(相等)的字符串

get_references_with_count函数的累积时间非常长。我想知道你们是否有更好的解决方案来更有效地完成这项任务。我的英语很抱歉,谢谢。

0 回复 | 直到 6 年前