代码之家  ›  专栏  ›  技术社区  ›  Hadi

如何在Java中从Url中提取全名

  •  0
  • Hadi  · 技术社区  · 6 年前

    URL(Direct Download Link) . 我想要一个强大的图书馆。我用 FileNameUtils Apache commons ,但是这个类不支持很多 URLs .

    我想要一个支持以下URL的库:

    https://example.cdn.com/mp4/7/9/5/file_795f32460d111df334849ee8336e56ca.mp4?e=1535545105&h=4772d27a70cd9b1c665b712f62592c47&download=1

    http://example.cdn.comr/post/93/3/Jozve-Kamele-arbi.abp.zip

    姓名:Jozve Kamele-地址:arbi.abp.zip

    http://cdl.example.com/?b=dl-software&f=Windows.8.1.Enterprise.x86.Aug.2018_n.part1.rar

    名称:dl software&f=Windows.8.1.Enterprise.x86.2018年8月\u n.part1.rar

    https://www.google.com/url?sa=t&source=web&rct=j&url=http://www.pdf995.com/samples/pdf.pdf&ved=2ahUKEwjV096X-ZHdAhVQzlkKHTpUBV4QFjAAegQIARAB&usg=AOvVaw3HFvAQ7GNf5QjsUo05ot-j

    有人能帮我吗?谢谢。

    如果我的句子语法不正确,我提前道歉。因为我英语说不好。

    2 回复  |  直到 6 年前
        1
  •  1
  •   gil.fernandes    6 年前

    实际上,你也可以尝试用正则表达式来解决这个问题(比如 (?i)([^=/&?]+\\.(" + EXTENSIONS + "))\\b ),如果您有感兴趣的文件扩展名列表。

    private static final String EXTENSIONS = "ez|aw|atom|atomcat|atomsvc|ccxml|cdmia|cdmic|cdmid|cdmio|cdmiq|cu|davmount|dbk|dssc|xdssc|ecma|emma|epub|exi|pfr|gml|gpx|gxf|stk|ipfix|jar|ser|class|js|json|jsonml|lostxml|hqx|cpt|mads|mrc|mrcx|mathml|mbox|mscml|metalink|meta4|mets|mods|mp4s|mp4|mxf|oda|opf|ogx|omdoc|oxps|xer|pdf|pgp|prf|p10|p7s|p8|ac|cer|crl|pkipath|pki|pls|cww|pskcxml|rdf|rif|rnc|rl|rld|rs|gbr|mft|roa|rsd|rss|rtf|sbml|scq|scs|spq|spp|sdp|setpay|setreg|shf|rq|srx|gram|grxml|sru|ssdl|ssml|tfi|tsd|plb|psb|pvb|tcap|pwn|aso|imp|acu|air|fcdt|xdp|xfdf|ahead|azf|azs|azw|acc|ami|apk|cii|fti|atx|mpkg|m3u8|swi|iota|aep|mpm|bmi|rep|cdxml|mmd|cdy|cla|rp9|c11amc|c11amz|csp|cdbcmsg|cmc|clkx|clkk|clkp|clkt|clkw|wbs|pml|ppd|car|pcurl|dart|rdz|fe_launch|dna|mlp|dpg|dfac|kpxx|ait|svc|geo|mag|nml|esf|msf|qam|slt|ssf|ez2|ez3|fdf|mseed|gph|ftc|fnc|ltf|fsc|oas|oa2|oa3|fg5|bh2|ddd|xdw|xbd|fzs|txd|ggb|ggt|gxt|g2w|g3w|gmx|kml|kmz|gac|ghf|gim|grv|gtm|tpl|vcg|hal|zmm|hbci|les|hpgl|hpid|hps|jlt|pcl|pclxl|sfd-hdstx|mpy|irm|sc|igl|ivp|ivu|igm|i2g|qbo|qfx|rcprofile|irp|xpr|fcs|jam|rms|jisp|joda|karbon|chrt|kfo|flw|kon|ksp|htke|kia|sse|lasxml|lbd|lbe|123|apr|pre|nsf|org|scm|lwp|portpkg|mcd|mc1|cdkey|mwf|mfm|flo|igx|mif|daf|dis|mbk|mqy|msl|plc|txf|mpn|mpc|xul|cil|cab|xlam|xlsb|xlsm|xltm|eot|chm|ims|lrm|thmx|cat|stl|ppam|pptm|sldm|ppsm|potm|docm|dotm|wpl|xps|mseq|mus|msty|taglet|nlu|nnd|nns|nnw|ngdat|n-gage|rpst|rpss|edm|edx|ext|odc|otc|odb|odf|odft|odg|otg|odi|oti|odp|otp|ods|ots|odt|odm|ott|oth|xo|dd2|oxt|pptx|sldx|ppsx|potx|xlsx|xltx|docx|dotx|mgp|dp|esa|paw|str|ei6|efif|wg|plf|pbd|box|mgz|qps|ptid|bed|mxl|musicxml|cryptonote|cod|rm|rmvb|link66|st|see|sema|semd|semf|ifm|itp|iif|ipk|mmf|teacher|dxp|sfs|sdc|sda|sdd|smf|sgl|smzip|sm|sxc|stc|sxd|std|sxi|sti|sxm|sxw|sxg|stw|svd|xsm|bdm|xdm|tao|tmo|tpt|mxs|tra|utz|umj|unityweb|uoml|vcx|vis|vsf|wbxml|wmlc|wmlsc|wtb|nbp|wpd|wqd|stf|xar|xfdl|hvd|hvs|hvp|osf|osfpvg|saf|spf|cmp|zaz|vxml|wgt|hlp|wsdl|wspolicy|7z|abw|ace|dmg|aam|aas|bcpio|torrent|bz|vcd|cfs|chat|pgn|nsc|cpio|csh|dgc|wad|ncx|dtb|res|dvi|evy|eva|bdf|gsf|psf|pcf|snf|arc|spl|gca|ulx|gnumeric|gramps|gtar|hdf|install|iso|jnlp|latex|mie|application|lnk|wmd|wmz|xbap|mdb|obd|crd|clp|mny|pub|scd|trm|wri|nzb|p7r|rar|ris|sh|shar|swf|xap|sql|sit|sitx|srt|sv4cpio|sv4crc|t3|gam|tar|tcl|tex|tfm|obj|ustar|src|fig|xlf|xpi|xz|xaml|xdf|xenc|dtd|xop|xpl|xslt|xspf|yang|yin|zip|adp|s3m|sil|eol|dra|dts|dtshd|lvp|pya|ecelp4800|ecelp7470|ecelp9600|rip|weba|aac|caf|flac|mka|m3u|wax|wma|rmp|wav|xm|cdx|cif|cmdf|cml|csml|xyz|ttc|otf|ttf|woff|woff2|bmp|cgm|g3|gif|ief|ktx|png|btif|sgi|psd|sub|dwg|dxf|fbs|fpx|fst|mmr|rlc|mdi|wdp|npx|wbmp|xif|webp|3ds|ras|cmx|ico|sid|pcx|pnm|pbm|pgm|ppm|rgb|tga|xbm|xpm|xwd|dae|dwf|gdl|gtw|mts|vtu|appcache|css|csv|n3|dsc|rtx|tsv|ttl|vcard|curl|dcurl|mcurl|scurl|sub|fly|flx|gv|3dml|spot|jad|wml|wmls|java|nfo|opml|etx|sfv|uu|vcs|vcf|3gp|3g2|h261|h263|h264|jpgv|ogv|dvb|fvt|pyv|viv|webm|f4v|fli|flv|m4v|mng|vob|wm|wmv|wmx|wvx|avi|movie|smv|ice";
    
    private static final Pattern FILE_DETECT = Pattern.compile("(?i)([^=/&?]+\\.(" + EXTENSIONS + "))\\b");
    
    public static Optional<String> extractFileFrom(String url) {
        Matcher matcher = FILE_DETECT.matcher(url);
        return (matcher.find()) ? Optional.of(matcher.group(1)) : Optional.empty();
    }
    

    下面是一个演示如何使用上述方法的测试:

    public static void main(String[] args) throws ParseException {
        List<String> strings = Arrays.asList(
                "https://example.cdn.com/mp4/7/9/5/file_795f32460d111df334849ee8336e56ca.mp4?e=1535545105&h=4772d27a70cd9b1c665b712f62592c47&download=1",
                "http://example.cdn.comr/post/93/3/Jozve-Kamele-arbi.abp.zip",
                "http://cdl.example.com/?b=dl-software&f=Windows.8.1.Enterprise.x86.Aug.2018_n.part1.rar",
                "https://www.google.com/url?sa=t&source=web&rct=j&url=http://www.pdf995.com/samples/pdf.pdf&ved=2ahUKEwjV096X-ZHdAhVQzlkKHTpUBV4QFjAAegQIARAB&usg=AOvVaw3HFvAQ7GNf5QjsUo05ot-j",
                "https://www.google.com/url?sa=t&source=web&rct=j&url=http://www.pdf995.com/samples/pdf.PDF&ved=2ahUKEwjV096X-ZHdAhVQzlkKHTpUBV4QFjAAegQIARAB&usg=AOvVaw3HFvAQ7GNf5QjsUo05ot-j");
        strings.stream().map(s -> extractFileFrom(s)).collect(Collectors.toList())
            .forEach(System.out::println);
    }
    

    如果执行main方法,您将在控制台上看到:

    Optional[file_795f32460d111df334849ee8336e56ca.mp4]
    Optional[Jozve-Kamele-arbi.abp.zip]
    Optional[Windows.8.1.Enterprise.x86.Aug.2018_n.part1.rar]
    Optional[pdf.pdf]
    Optional[pdf.PDF]
    
        2
  •  1
  •   Khemraj Open To Work    6 年前

    我用这个方法,希望对你也有帮助。它也会从问号开始解析。

    public static String parseFileNameFromUrl(String url) {
        if (url == null) {
            return "";
        }
        try {
            URL res = new URL(url);
            String resHost = res.getHost();
            if (resHost.length() > 0 && url.endsWith(resHost)) {
                // handle ...example.com
                return "";
            }
        } catch (MalformedURLException e) {
            e.printStackTrace();
            return "";
        }
    
        int startIndex = url.lastIndexOf('/') + 1;
        int length = url.length();
    
        // find end index for ?
        int lastQuestionMarkPos = url.lastIndexOf('?');
        if (lastQuestionMarkPos == -1) {
            lastQuestionMarkPos = length;
        }
    
        // find end index for #
        int lastHashPos = url.lastIndexOf('#');
        if (lastHashPos == -1) {
            lastHashPos = length;
        }
    
        // calculate the end index
        int endIndex = Math.min(lastQuestionMarkPos, lastHashPos);
        return url.substring(startIndex, endIndex);
    }