代码之家  ›  专栏  ›  技术社区  ›  Kai

如何在vb.net中提取邮件正文中的img标记

  •  3
  • Kai  · 技术社区  · 14 年前

    我将邮件内容(邮件正文)存储在数据库中。
    我想从这些邮件内容中提取all image tag()的“src”属性的值。
    邮件正文中可能包含一个或多个图像。

    请告诉我如何在vb.net中实现这一点?
    谢谢。

    1 回复  |  直到 14 年前
        1
  •  6
  •   Ben H    14 年前

    你可以使用 正则表达式 .

    Try
        Dim RegexObj As New Regex("<img[^>]+src=[""']([^""']+)[""']", RegexOptions.Singleline Or RegexOptions.IgnoreCase)
        Dim MatchResults As Match = RegexObj.Match(SubjectString)
        While MatchResults.Success
            ' SRC attribute is in MatchResults.Groups(1).Value
            MatchResults = MatchResults.NextMatch()
        End While
    Catch ex As ArgumentException
        'Syntax error in the regular expression (which there isn't)
    End Try
    

    它的工作原理如下:

    <img[^>]+src=["']([^"']+)["']
    
    Match the characters "<img" literally «<img»
    Match any character that is not a ">" «[^>]+»
       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
    Match the characters "src=" literally «src=»
    Match a single character present in the list ""'" «["']»
    Match the regular expression below and capture its match into backreference number 1 «([^"']+)»
       Match a single character NOT present in the list ""'" «[^"']+»
          Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
    Match a single character present in the list ""'" «["']»