我对节点.js和XMLHttpRequest,所以如果这是一个简单回答的问题,请耐心等待。
我目前正在尝试抓取一个朋友的网页(当然是在他的允许下),其中有视频和字幕。我想通过写一个节点.js命令行应用程序。目前,我只是想找到视频的链接和字幕的链接。到目前为止,我得到的是:
#!/usr/bin/env node
var XMLHttpRequest = require("xmlhttprequest").XMLHttpRequest;
var htmlparser = require("htmlparser2");
var xhttp = new XMLHttpRequest();
xhttp.onreadystatechange = function() {
if (this.readyState == 4 && this.status == 200) {
// HTML source
var html = this.responseText;
var season = 0;
var episode = 0;
var parser = new htmlparser.Parser({
onopentag: function(name, attribs) {
if (name === "li" && attribs.id === "season-1") {
season = 1;
console.log("In season 1");
for(var attr in attribs){
console.log(attr);
}
}
if (name === "a" && season === 1) {
episode = 1;
var nextPage = attribs.data;
console.log("\""+nextPage+"\"");
// Go to "nextPage" here
xhttp.open("GET", "\""+nextPage+"\"", true);
}
},
onattribute: function(name, value) {
if(name === "data-url" && season === 1){
if(value.includes("episode-")){
episode = value.substr(8,1);
}
console.log(value);
console.log("Episode is: " + episode)
}
},
ontext: function(text) {
},
onclosetag: function(tagname) {
if (tagname === "li" && season === 1) {
season = 0;
console.log("Leaving season 1");
}
}
}, {
decodeEntities: true
});
parser.write(html);
parser.end();
}
};
xhttp.open("GET", "https://friendspage.org", true);
xhttp.send();
In season 1
id
episode-1
Episode is: 1
"https://friendspage.org/episode-1"
episode-2
Episode is: 2
"https://friendspage.org/episode-2"
episode-3
Episode is: 3
"https://friendspage.org/episode-3"
episode-4
Episode is: 4
"https://friendspage.org/episode-4"
episode-5
Episode is: 5
"https://friendspage.org/episode-5"
episode-6
Episode is: 6
"https://friendspage.org/episode-6"
episode-7
Episode is: 7
"https://friendspage.org/episode-7"
episode-8
Episode is: 8
"https://friendspage.org/episode-8"
episode-9
Episode is: 9
"https://friendspage.org/episode-9"
Leaving season 1
代码按我的要求工作,除了我想转到
nextPage
. 我将使用命令行中的输入变量来选择要转到哪个页面,但是目前,我不知道如何转到
// Go to "nextPage" here
xhttp.open("GET", "\""+nextPage+"\"", true);
尝试使用
xhttp.send()
send has already been called
. 我猜我要么需要关闭当前请求并打开一个新请求,要么只是启动另一个XMLHttpRequest实例。