代码之家  ›  专栏  ›  技术社区  ›  JBGruber

在HTML报表中可视化差异

  •  0
  • JBGruber  · 技术社区  · 6 年前

    我想制作一个html报告,它可以直观地显示一些非常相似的文本之间的差异。我找到了 diffobj 包,如果在交互会话中使用,它将执行我所需的操作:

    enter image description here

    但是,我不知道如何在报表中呈现它。以下是我迄今为止所做的尝试:

    ---
    title: "Repex"
    output: html_document
    ---
    
    ```{r cars}
    duplicates <- data.frame(text_original = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas laoreet risus et eros sagittis aliquam. Donec fringilla pharetra vestibulum. Fusce vestibulum imperdiet nibh ac rutrum. Aenean sollicitudin, tellus sed tempor varius, quam dolor ornare sapien, eu faucibus quam arcu vestibulum velit. Praesent maximus odio magna, in vulputate arcu cursus vitae. Praesent condimentum purus sit amet nisl vestibulum semper. Nunc quis eros ultricies, elementum eros sed, ullamcorper nunc. Nunc dictum commodo quam, et venenatis velit porta sit amet. Nunc et lorem et odio scelerisque vulputate sed at purus. Sed velit ipsum, consequat vel tristique tincidunt, semper in odio. Nullam pharetra laoreet velit quis sollicitudin. Fusce tellus felis, scelerisque id ipsum et, varius iaculis erat. Sed porttitor at quam sed rhoncus. Donec rutrum justo nec malesuada aliquam. Maecenas feugiat odio ac ante consequat, aliquet tempus magna tempus. Morbi convallis orci felis, ac ultricies ex dignissim in. Donec ornare vehicula ante eu interdum.",
                             text_duplicate = "Lorem dolor sit amet, consectetur elit. Maecenas laoreet risus et eros sagittis aliquam. Donec fringilla pharetra vestibulum. Fusce vestibulum imperdiet nibh ac rutrum. Aenean sollicitudin, tellus sed tempor varius, quam dolor ornare sapien, eu faucibus quam arcu vestibulum velit. Praesent maximus odio magna, in vulputate arcu cursus vitae. Praesent condimentum purus sit amet nisl vestibulum semper. Nunc quis eros ultricies, elementum sed, ullamcorper nunc. Nunc commodo quam, et venenatis velit porta sit amet. Nunc et lorem et odio scelerisque vulputate sed at purus. Sed velit ipsum, consequat vel tristique tincidunt, semper in odio. Nullam pharetra laoreet velit quis sollicitudin. Fusce tellus felis, scelerisque id ipsum et, varius iaculis erat. Sed porttitor at quam sed rhoncus. Donec rutrum justo nec malesuada aliquam. Maecenas feugiat odio ac ante consequat, aliquet tempus magna tempus. Morbi convallis orci felis, ac ultricies ex dignissim in. Donec ornare vehicula ante eu interdum.",
                             stringsAsFactors = FALSE)
    
    library(diffobj)
    for (i in 1) {
      orig <- unname(unlist(quanteda::tokens(duplicates$text_original[i], what = "sentence")))
      dup <- unname(unlist(quanteda::tokens(duplicates$text_duplicate[i], what = "sentence")))
      diff <- diffPrint(current = orig,
                        target = dup,
                        format = "html",
                        interactive = FALSE)
      print(diff)
    }
    ```
    

    编织此文件时的结果是显示了每个diff的html代码,但没有呈现。注意,我只使用了for循环和一个迭代来演示。真正的任务是渲染大约50个差异。

    任何解决这个问题的方法都可以调整 diffPrint() rmarkdown 是吗?

    1 回复  |  直到 6 年前
        1
  •  1
  •   TC Zhang    6 年前

    问题是diffprint不输出css样式。

    1. 在设置块中,添加结果“asis”

      ```{r cars, results="asis"}
      duplicates <- data.frame(text_original = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas laoreet risus et eros sagittis aliquam. Donec fringilla pharetra vestibulum. Fusce vestibulum imperdiet nibh ac rutrum. Aenean sollicitudin, tellus sed tempor varius, quam dolor ornare sapien, eu faucibus quam arcu vestibulum velit. Praesent maximus odio magna, in vulputate arcu cursus vitae. Praesent condimentum purus sit amet nisl vestibulum semper. Nunc quis eros ultricies, elementum eros sed, ullamcorper nunc. Nunc dictum commodo quam, et venenatis velit porta sit amet. Nunc et lorem et odio scelerisque vulputate sed at purus. Sed velit ipsum, consequat vel tristique tincidunt, semper in odio. Nullam pharetra laoreet velit quis sollicitudin. Fusce tellus felis, scelerisque id ipsum et, varius iaculis erat. Sed porttitor at quam sed rhoncus. Donec rutrum justo nec malesuada aliquam. Maecenas feugiat odio ac ante consequat, aliquet tempus magna tempus. Morbi convallis orci felis, ac ultricies ex dignissim in. Donec ornare vehicula ante eu interdum.",
                               text_duplicate = "Lorem dolor sit amet, consectetur elit. Maecenas laoreet risus et eros sagittis aliquam. Donec fringilla pharetra vestibulum. Fusce vestibulum imperdiet nibh ac rutrum. Aenean sollicitudin, tellus sed tempor varius, quam dolor ornare sapien, eu faucibus quam arcu vestibulum velit. Praesent maximus odio magna, in vulputate arcu cursus vitae. Praesent condimentum purus sit amet nisl vestibulum semper. Nunc quis eros ultricies, elementum sed, ullamcorper nunc. Nunc commodo quam, et venenatis velit porta sit amet. Nunc et lorem et odio scelerisque vulputate sed at purus. Sed velit ipsum, consequat vel tristique tincidunt, semper in odio. Nullam pharetra laoreet velit quis sollicitudin. Fusce tellus felis, scelerisque id ipsum et, varius iaculis erat. Sed porttitor at quam sed rhoncus. Donec rutrum justo nec malesuada aliquam. Maecenas feugiat odio ac ante consequat, aliquet tempus magna tempus. Morbi convallis orci felis, ac ultricies ex dignissim in. Donec ornare vehicula ante eu interdum.",
                               stringsAsFactors = FALSE)
      
      library(diffobj)
      for (i in 1) {
        orig <- unname(unlist(quanteda::tokens(duplicates$text_original[i], what = "sentence")))
        dup <- unname(unlist(quanteda::tokens(duplicates$text_duplicate[i], what = "sentence")))
        diff <- diffPrint(current = orig,
                          target = dup,
                          format = "html",
                          interactive = FALSE)
        print(diff)
      }
      ```
      
    2. 在rmd文件的某处添加

      <style type="text/css">
      DIV.diffobj_container PRE {
        white-space: pre-wrap;
        margin: 0;
      }
      DIV.diffobj_container DIV.row {
        width: 100%;
        font-family: monospace;
        display: table;
        table-layout: fixed;
      }
      DIV.diffobj_container DIV.line {
        width: auto;
        display: table-cell;
        overflow: hidden;
      }
      DIV.diffobj_container DIV.line>DIV {
        width: 100%;
        display: table;
        table-layout: auto;
      }
      DIV.diffobj_container DIV.line.banner>DIV {
        display: table;
        table-layout: auto; /* set to fixed in JS */
      }
      DIV.diffobj_container DIV.text {
        display: table-cell;
        width: 100%;
      }
      DIV.diffobj_container DIV.gutter {
        display: table-cell;
        padding: 0 0.2em;
      }
      DIV.diffobj_container DIV.gutter DIV {
        display: table-cell;
      }
      #diffobj_content_meta DIV.diffobj_container DIV.row {
        width: auto;
      }
      #diffobj_banner_meta DIV.diffobj_container DIV.line.banner>DIV {
        table-layout: auto;
      }
      #diffobj_outer {
        overflow: hidden;
      }
      /* Summary -------------------------------------------------------------------*/ 
      
      DIV.diffobj_container DIV.summary DIV.map {
        word-wrap: break-word;
        padding-left: 1em;
      }
      DIV.diffobj_container DIV.summary DIV.detail {
        padding-left: 1em;
      }
      
      /* Common elements -----------------------------------------------------------*/
      
      DIV.diffobj_container DIV.line.banner {
        font-size: 1.2em;
        font-weight: bold;
        overflow: hidden;
      }
      /* truncate banners */
      DIV.diffobj_container DIV.line.banner DIV.text DIV{
        white-space: nowrap;
        overflow: hidden;
        text-overflow: ellipsis;
        width: 100%;             /* need to compute and set in JS */
      }
      DIV.diffobj_container DIV.gutter,
      DIV.diffobj_container DIV.guide,
      DIV.diffobj_container DIV.fill,
      DIV.diffobj_container DIV.context_sep,
      DIV.diffobj_container SPAN.trim {
        color: #999;
      }
      DIV.diffobj_container DIV.header {
        font-size: 1.1em;
      }
      DIV.diffobj_container DIV.text>DIV.match,
      DIV.diffobj_container DIV.text>DIV.guide {
        background-color: #ffffff;
      }
      DIV.diffobj_container DIV.text>DIV.fill {
        background-color: transparent;
      }
      DIV.diffobj_container DIV.text>DIV {
        padding-right: 3px;
      }
      DIV.diffobj_container DIV.text>DIV {
        border-left: 1px solid #888888;
      }
      DIV.diffobj_container DIV.line {
        background-color: #eeeeee;
      }
      DIV.diffobj_container DIV.text>DIV,
      DIV.diffobj_container DIV.header {
        padding-left: 0.5em;
      }
      DIV.diffobj_container DIV.line>DIV.match,
      DIV.diffobj_container DIV.line>DIV.fill,
      DIV.diffobj_container DIV.line>DIV.guide {
        border-left: 1px solid #888888;
      }
      /* github inspired color scheme - default ------------------------------------*/
      
      DIV.diffobj_container.light.rgb SPAN.word.insert,
      DIV.diffobj_container.light.rgb DIV.line>DIV.insert {
        background-color: #a6f3a6;
      }
      DIV.diffobj_container.light.rgb SPAN.word.delete,
      DIV.diffobj_container.light.rgb DIV.line>DIV.delete {
        background-color: #f8c2c2;
      }
      DIV.diffobj_container.light.rgb DIV.text>DIV.insert {
        background-color: #efffef;
      }
      DIV.diffobj_container.light.rgb DIV.text>DIV.insert,
      DIV.diffobj_container.light.rgb DIV.line>DIV.insert {
        border-left: 1px solid #33bb33;
      }
      DIV.diffobj_container.light.rgb DIV.text>DIV.delete {
        background-color: #ffefef;
      }
      DIV.diffobj_container.light.rgb DIV.text>DIV.delete,
      DIV.diffobj_container.light.rgb DIV.line>DIV.delete {
        border-left: 1px solid #cc6666;
      }
      DIV.diffobj_container.light.rgb DIV.header {
        background-color: #e0e6fa;
        border-left: 1px solid #9894b6;
      }
      /* Yellow Blue variation -----------------------------------------------------*/
      
      DIV.diffobj_container.light.yb SPAN.word.insert,
      DIV.diffobj_container.light.yb DIV.line>DIV.insert {
        background-color: #c0cfff;
      }
      DIV.diffobj_container.light.yb SPAN.word.delete,
      DIV.diffobj_container.light.yb DIV.line>DIV.delete {
        background-color: #e7e780;
      }
      DIV.diffobj_container.light.yb DIV.text>DIV.insert {
        background-color: #efefff;
      }
      DIV.diffobj_container.light.yb DIV.text>DIV.insert,
      DIV.diffobj_container.light.yb DIV.line>DIV.insert {
        border-left: 1px solid #3333bb;
      }
      DIV.diffobj_container.light.yb DIV.text>DIV.delete {
        background-color: #fefee5;
      }
      DIV.diffobj_container.light.yb DIV.text>DIV.delete,
      DIV.diffobj_container.light.yb DIV.line>DIV.delete {
        border-left: 1px solid #aaaa55;
      }
      DIV.diffobj_container.light.yb DIV.header {
        background-color: #afafaf;
        border-left: 1px solid #e3e3e3;
        color: #e9e9e9;
      }
      DIV.diffobj_container.light.yb DIV.line {
        background-color: #eeeeee;
      }
      </style>
      

    A working example

    source rmarkdown

    作为记录, related discussion on github

    推荐文章