代码之家  ›  专栏  ›  技术社区  ›  Learning

使用HtmlAgilityPack获取url内容会产生错误

  •  1
  • Learning  · 技术社区  · 6 年前

    我正在使用 HtmlAgilityPack 从url抓取文本,这对大多数网站来说都很好,对一些网站来说,今天开始返回错误。

    行代码后出现错误 doc = webGet.Load(url); 错误消息: The underlying connection was closed: An unexpected error occurred on a send.

    link

    我尝试了https网址,比如bbc.com它对它有用。任何指针,如果他们的代码有问题

     HtmlDocument doc = new HtmlDocument();
                var url = txtGrabNewsURL.Text.Trim();
    
            var webGet = new HtmlWeb();
            doc = webGet.Load(url);
            var baseUrl = new Uri(url);
            //  doc.LoadHtml(response);
    
            String title = (from x in doc.DocumentNode.Descendants()
                            where x.Name.ToLower() == "title"
                            select x.InnerText).FirstOrDefault();
    
            String desc = (from x in doc.DocumentNode.Descendants()
                           where x.Name.ToLower() == "meta"
                           && x.Attributes["name"] != null
                           && x.Attributes["name"].Value.ToLower() == "description"
                           select x.Attributes["content"].Value).FirstOrDefault();
    
            String ogImage = (from x in doc.DocumentNode.Descendants()
                              where x.Name.ToLower() == "meta"
                              && x.Attributes["property"] != null
                              && x.Attributes["property"].Value.ToLower() == "og:image"
                              select x.Attributes["content"].Value).FirstOrDefault();
    
    
            List<String> imgs = (from x in doc.DocumentNode.Descendants()
                                 where x.Name.ToLower() == "img"
                                  && x.Attributes["src"] != null
                                 select x.Attributes["src"].Value).ToList<String>();
    
            List<String> imgList = (from x in doc.DocumentNode.Descendants("img")
                                    where x.Attributes["src"] != null
                                    select x.Attributes["src"].Value.ToLower()).ToList<String>();
    

    完整错误详细信息

    System.Net.WebException was caught
      HResult=-2146233079
      Message=The underlying connection was closed: An unexpected error occurred on a send.
      Source=System
      StackTrace:
           at System.Net.HttpWebRequest.GetResponse()
           at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in D:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1355
           at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in D:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1479
           at HtmlAgilityPack.HtmlWeb.Load(String url, String method) in D:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1106
           at HtmlAgilityPack.HtmlWeb.Load(String url) in D:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1061
           at _admin_News.btnGrabNews_Click(Object sender, EventArgs e) in c:\path\News.aspx.cs:line 361
      InnerException: System.IO.IOException
           HResult=-2146232800
           Message=Authentication failed because the remote party has closed the transport stream.
           Source=System
           StackTrace:
                at System.Net.Security.SslState.StartReadFrame(Byte[] buffer, Int32 readBytes, AsyncProtocolRequest asyncRequest)
                at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
                at System.Net.Security.SslState.CheckCompletionBeforeNextReceive(ProtocolToken message, AsyncProtocolRequest asyncRequest)
                at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
                at System.Net.Security.SslState.ForceAuthentication(Boolean receiveFirst, Byte[] buffer, AsyncProtocolRequest asyncRequest)
                at System.Net.Security.SslState.ProcessAuthentication(LazyAsyncResult lazyResult)
                at System.Net.TlsStream.CallProcessAuthentication(Object state)
                at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
                at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
                at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
                at System.Net.TlsStream.ProcessAuthentication(LazyAsyncResult result)
                at System.Net.TlsStream.Write(Byte[] buffer, Int32 offset, Int32 size)
                at System.Net.PooledStream.Write(Byte[] buffer, Int32 offset, Int32 size)
                at System.Net.ConnectStream.WriteHeaders(Boolean async)
           InnerException: 
    
    2 回复  |  直到 6 年前
        1
  •  4
  •   Micah Epps Etienne Maheu    6 年前

    如果它只发生在HTTP上 S码 如果您的目标是.NET4,那么它可能与默认的SSL/TLS支持有关。请尝试以下操作:

    using System.Net;
    
    static void Main()
    {
        //place this anywhere in your code prior to invoking the Web request
        ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls12 |  SecurityProtocolType.Ssl3; 
    }
    
        2
  •  0
  •   Anup Patil    6 年前

    我以为当时网站不工作,连接出现问题。

       HtmlDocument doc = new HtmlDocument();
            var url = "https://m.gulfnews.com/business/sectors/banking/rebuilding-lives-10-years-after-lehman-s-fall-1.2277318"
    
        var webGet = new HtmlWeb();
        doc = webGet.Load(url);
    
        String title = (from x in doc.DocumentNode.Descendants()
                        where x.Name.ToLower() == "title"
                        select x.InnerText).FirstOrDefault();
    

    产出:重建生活,10。等等。