代码之家  ›  专栏  ›  技术社区  ›  max

我怎样才能得到作为某个动词宾语的名词从句?

  •  8
  • max  · 技术社区  · 6 年前

    我正在处理来自药品标签的数据。文本始终使用动词短语“指示”。

    例如:

    sentence = "Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis"
    

    我已经使用SpaCy过滤到只包含短语“指示”的句子。

    我现在需要一个函数,该函数将接收句子,并返回“indicated for”的对象短语。对于这个例子,我调用了 extract() ,将按如下方式操作:

    extract(sentence)
    >> 'relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis'
    

    是否有使用spacy进行此操作的功能?

    编辑: 简单地在“indicated for”之后拆分对复杂的示例不起作用。

    以下是一些示例:

    ''丁丙诺啡和纳洛酮舌下含片适用于 阿片类药物依赖的维持治疗 并应作为完整治疗计划的一部分使用,包括咨询和心理社会支持丁丙诺啡和纳洛酮舌下片剂含有丁丙诺啡部分阿片受体激动剂和纳洛酮阿片受体拮抗剂,适用于 阿片类药物依赖的维持治疗 '''

    ''氧氟沙星眼用溶液适用于 治疗由以下细菌敏感菌株引起的感染 在下列情况下结膜炎革兰氏阳性细菌革兰氏阴性细菌金黄色葡萄球菌表皮葡萄球菌肺炎链球菌阴沟肠杆菌流感嗜血杆菌奇异变形杆菌铜绿假单胞菌角膜溃疡革兰氏阳性细菌革兰氏阴性细菌金黄色葡萄球菌表皮葡萄球菌肺炎链球菌假单胞菌粘质绿脓杆菌

    我只想要加粗的部分。

    4 回复  |  直到 6 年前
        1
  •  6
  •   Programmer_nltk    6 年前
    # -*- coding: utf-8 -*-
    #!/usr/bin/env python
    from __future__ import unicode_literals
    import spacy
    nlp = spacy.load('en_core_web_sm')
    text = 'Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.'
    doc = nlp(text)
    for word in doc:
        if word.dep_ in ('pobj'):
            subtree_span = doc[word.left_edge.i : word.right_edge.i + 1]
            print(subtree_span.text)
    

    输出:

    relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis
    the signs and symptoms of osteoarthritis and rheumatoid arthritis
    osteoarthritis and rheumatoid arthritis
    

    多输出的原因是由于多个pobj。

    编辑2:

    # -*- coding: utf-8 -*-
    #!/usr/bin/env python
    from __future__ import unicode_literals
    import spacy
    nlp = spacy.load('en_core_web_sm')
    para = '''Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.
    Ofloxacin ophthalmic solution is indicated for the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below.'''
    doc = nlp(para)
    
    # To extract sentences based on key word
    indicated_for_sents = [sent for sent in doc.sents if 'indicated for' in sent.string]
    print indicated_for_sents
    print
    # To extract objects of verbs
    for word in doc:
        if word.dep_ in ('pobj'):
            subtree_span = doc[word.left_edge.i : word.right_edge.i + 1]
            print(subtree_span.text)
    

    输出:

    [Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.
    , Ofloxacin ophthalmic solution is indicated for the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below.]
    
    relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis
    the signs and symptoms of osteoarthritis and rheumatoid arthritis
    osteoarthritis and rheumatoid arthritis
    
    
    the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below
    infections caused by susceptible strains of the following bacteria in the conditions listed below
    susceptible strains of the following bacteria in the conditions listed below
    the following bacteria in the conditions listed below
    the conditions listed below
    

    检查此链接

    https://github.com/NSchrading/intro-spacy-nlp/blob/master/subject_object_extraction.py

        2
  •  1
  •   Adnan S    6 年前

    您需要使用Spacy的依赖项解析功能。所选包含('indicated for')的句子应该在Spacy中进行依赖性分析,以显示所有单词之间的关系。您可以通过Spacy看到问题中示例句子的依赖关系解析的可视化 here

    在Spacy返回依赖关系解析后,您需要搜索作为动词的“指示”标记,并找到依赖关系树的子级。参见示例 here 。在您的例子中,您将查找匹配“indicated”作为动词,并获取子级,而不是Github示例中的“xcomp”或“ccomp”。

        3
  •  0
  •   Acccumulation    6 年前

    你不需要SpaCy。您可以执行正则表达式或拆分:

    sentence = "Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis"
    sentence.split('indicated for ')[1]
    >>> relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis
    

    这是基于对字符串的假设,例如“指示对象”只出现一次,后面的所有内容都是您想要的,等等。

    语法注释:你要找的实际上是间接宾语,而不是主语。主题为“美洛昔康片”。

        4
  •  0
  •   Euclidian    6 年前

    看看这个 Noun phrases with spacy https://spacy.io/usage/linguistic-features#noun-chunks 。我不是SpaCy方面的专家,但这应该会有所帮助。