代码之家  ›  专栏  ›  技术社区  ›  Andy

如果我不是管理员,如何下载phpBB3论坛的所有帖子?[关闭]

  •  1
  • Andy  · 技术社区  · 14 年前

    我习惯于在一个论坛上发表我的想法,并开始担心如果它被关闭,我会失去他们。你知道一个好方法来下载整个(其他人的想法也不错!)phpBB3论坛到数据库?有没有现成的软件,还是我自己写?

    更新1:

    更新2:

    How can I download an entire (active) phpbb forum?

    但我更喜欢用Ruby脚本来备份论坛。这不是一个完整的解决方案,但对我来说已经足够了。是的,如果你这么担心的话,它不会违反任何TOS。

    require :rubygems
    require :hpricot
    require 'open-uri'
    require :uri
    require :cgi
    #require 'sqlite3-ruby'
    
    class PHPBB
      def initialize base_url
        @base_url = base_url
        @forums, @topics = Array.new(4) { {} }
        self.parse_main_page 'main', 'index.php'
        @forums.keys.each do |f|
          self.parse_forum "forum.#{f}", "viewforum.php?f=#{f}"
        end
        @topics.keys.each do |t|
          self.parse_topic "topic.#{t}", "viewtopic.php?t=#{t}"
        end
      end
    
    
      def read_file cached, remote
        local = "%s.%s.html" % [__FILE__, cached]
        if File.exists? local
          return IO.read local
        else # download and save
          puts "load #{remote}"
          File.new(local, "w+") << (content = open(@base_url + remote).read)
          return content
        end
      end
    
    
      def parse_main_page local, remote
        doc = Hpricot(self.read_file(local,remote))
        doc.search('ul.forums/li.row').each do |li|
          fa = li.search('a.forumtitle').first # forum anchor
          f = self.parse_anchor(fa)['f']
          @forums[f] = {
            forum_id: f,
            title: fa.inner_html,
            description: li.search('dl/dt').first.inner_html.split('<br />').last.strip
          }
          ua, pa = li.search('dd.lastpost/span/a') # user anchor, post anchor
          q = self.parse_anchor(pa)
          self.last_post f, q['p'] unless q.nil?
        end
      end
    
      def last_post f,p
        @last_post = {forum_id: f, post_id: p} if @last_post.nil? or p.to_i > @last_post[:post_id].to_i
      end
    
      def last_topic f,t
      end
    
    
      def parse_forum local, remote, start=nil
        doc = Hpricot(self.read_file(local,remote))
        doc.search('ul.topics/li.row').each do |li|
          ta = li.search('a.topictitle').first # topic anchor
          q = self.parse_anchor(ta)
          f = q['f']
          t = q['t']
          u = self.parse_anchor(li.search('dl/dt/a').last)['u']
          @topics[t] = {
            forum_id: f,
            topic_id: t,
            user_id: u,
            title: ta.inner_html
          }
        end
      end
    
    
      def parse_topic local, remote, start=nil
        doc = Hpricot(self.read_file(local,remote))
        if start.nil?
          doc.search('div.pagination/span/a').collect{ |p| self.parse_anchor(p)['start'] }.uniq.each do |p|
            self.parse_topic "#{local}.start.#{p}", "#{remote}&start=#{p}", true
          end
        end
        doc.search('div.postbody').each do |li|
          # do something
        end
      end
    
    
      def parse_url href
        r = CGI.parse URI.parse(href).query
        r.each_pair { |k,v| r[k] = v.last }
      end
    
    
      def parse_anchor hp
        self.parse_url hp.attributes['href'] unless hp.nil?
      end
    end
    
    1 回复  |  直到 7 年前
        1
  •  3
  •   shamittomar    14 年前

    这将违反服务条款,也可能是非法的。