代码之家  ›  专栏  ›  技术社区  ›  REDMAN

C#HTTP GET请求到http://sede.educacion.gob.es/

  •  1
  • REDMAN  · 技术社区  · 6 年前

    我正在尝试执行GET请求 https://sede.educacion.gob.es/publiventa/catalogo.action?cod=E ;使用cod=E参数,在浏览器中,网站在“Materialas de educacin”下面打开一个菜单,但当我使用C执行请求时,此菜单未加载,我需要它。这是我用来将HTML读取为字符串的代码,以便稍后使用HtmlAgilityPack对其进行解析。

    private string readHtml(string urlAddress)
        {
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
            request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0";
            request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
            request.AutomaticDecompression = DecompressionMethods.GZip;
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    
            if (response.StatusCode == HttpStatusCode.OK)
            {
                Stream receiveStream = response.GetResponseStream();
                StreamReader readStream = null;
    
                if (response.CharacterSet == null)
                {
                    readStream = new StreamReader(receiveStream);
                }
                else
                {
                    readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
                }
    
                string data = readStream.ReadToEnd();
    
                response.Close();
                readStream.Close();
                return data;
            }
            return null;
        }
    
    1 回复  |  直到 6 年前
        1
  •  1
  •   Jimi    6 年前

    您发布的Uri( https://sede.educacion.gob.es/publiventa/catalogo.action?cod=E )使用Javascript开关显示菜单内容。
    当您连接到该Uri(不单击菜单链接)时,该站点将显示该页面的三个不同版本。

    1) 带有关闭菜单和建议的新版本的页面
    2) 带有关闭菜单和搜索引擎字段的页面
    3) 具有打开菜单和菜单内容选择的页面

    此切换基于记录当前会话的内部过程。除非单击菜单链接(连接到事件侦听器),否则Javascript过程将以不同的状态显示页面。
    我看了它一眼;这些脚本相当长(一个完整的多用途库),我没有时间对其进行解析(可能您可以这样做),以找出事件侦听器传递的参数。

    但是,三态版本切换是恒定的。
    我的意思是,您可以三次调用该页面,保留Cookie容器:第三次连接到它时,它将流式处理整个菜单内容及其链接。

    如果您三次请求同一页面,第三次Html页面将 包含所有
    Materias de educación 链接

    public async void SomeMethodAsync()
    {
        string HtmlPage = await GetHttpStream([URI]);
        HtmlPage = await GetHttpStream([URI]);
        HtmlPage = await GetHttpStream([URI]);
    }
    

    或多或少,这就是我用来获取该页面的内容:

    CookieContainer CookieJar = new CookieContainer();
    
    public async Task<string> GetHttpStream(Uri HtmlPage)
    {
        HttpWebRequest httpRequest;
        string Payload = string.Empty;
        httpRequest = WebRequest.CreateHttp(HtmlPage);
    
        try
        {
            httpRequest.CookieContainer = CookieJar;
            httpRequest.KeepAlive = true;
            httpRequest.ConnectionGroupName = Guid.NewGuid().ToString();
            httpRequest.AllowAutoRedirect = true;
            httpRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
            httpRequest.ServicePoint.MaxIdleTime = 30000;
            httpRequest.ServicePoint.Expect100Continue = false;
            httpRequest.UserAgent = "Mozilla/5.0 (Windows NT 10; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0";
            httpRequest.Accept = "ext/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
            httpRequest.Headers.Add(HttpRequestHeader.AcceptLanguage, "es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3");
            httpRequest.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip, deflate;q=0.8");
            httpRequest.Headers.Add(HttpRequestHeader.CacheControl, "no-cache");
    
    
            using (HttpWebResponse httpResponse = (HttpWebResponse)await httpRequest.GetResponseAsync())
            {
                Stream ResponseStream = httpResponse.GetResponseStream();
    
                if (httpResponse.StatusCode == HttpStatusCode.OK)
                {
                    try
                    {
                        //ResponseStream.Position = 0;
                        Encoding encoding = Encoding.GetEncoding(httpResponse.CharacterSet);
    
                        using (MemoryStream _memStream = new MemoryStream())
                        {
                            if (httpResponse.ContentEncoding.Contains("gzip"))
                            {
                                using (GZipStream _gzipStream = new GZipStream(ResponseStream, System.IO.Compression.CompressionMode.Decompress))
                                {
                                    _gzipStream.CopyTo(_memStream);
                                };
                            }
                            else if (httpResponse.ContentEncoding.Contains("deflate"))
                            {
                                using (DeflateStream _deflStream = new DeflateStream(ResponseStream, System.IO.Compression.CompressionMode.Decompress))
                                {
                                    _deflStream.CopyTo(_memStream);
                                };
                            }
                            else
                            {
                                ResponseStream.CopyTo(_memStream);
                            }
    
                            _memStream.Position = 0;
                            using (StreamReader _reader = new StreamReader(_memStream, encoding))
                            {
                                Payload = _reader.ReadToEnd().Trim();
                            };
                        };
                    }
                    catch (Exception)
                    {
                        Payload = string.Empty;
                    }
                }
            }
        }
        catch (WebException exW)
        {
            if (exW.Response != null)
            {
                //Handle WebException
            }
        }
        catch (System.Exception exS)
        {
            //Handle System.Exception
        }
    
        CookieJar = httpRequest.CookieContainer;
        return Payload;
    }