<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Dear all, <br>
    </p>
    <p>Many thanks for the great meeting today. One more thought that I
      just wanted to document / not forget:  <br>
    </p>
    <p>Format 1 (statistical derivatives, e.g.: term-document-matrix
      with word frequencies) and format 2 (transformational derivatives,
      e.g. documents with randomized word order) are maybe not
      necessarily as distinct as they seem, depending on the exact
      examples for the two cases that we consider. The reason is that
      some of them can be transformed into each other, with a few
      assumptions. <br>
    </p>
    <p>- We can use a term-document-matrix to generate plain text with
      randomized word order; <br>
      - We can use a document with randomized word order to generate a
      term-document matrix; <br>
    </p>
    <p>A few observations: <br>
    </p>
    <p>- The relationship is not 100% symmetrical: From a document with
      randomized word order, where randomization happens within a
      certain and known segment size, we can either build a
      term-document-matrix that maintains these segments (with separate
      columns for each segment) or we can merge the frequencies for all
      segments belonging to one document into a single column.
      Conversely, from a t-d-m with a given segmentation performed
      before calculating the t-d-m, we can generate randomized texts
      respecting these segment boundaries or generate one set of
      randomized words for each entire document. But when there is no
      segmentation, in either format, we cannot reconstruct one on the
      other format. <br>
      - The transformation from t-d-m to randomized word order document
      may not always be 100% exact: it can be exact if the t-d-m
      contains absolute word frequencies; it can also be exact if the
      t-d-m contains relative frequencies and we know the total number
      of words of each document; but it cannot be exact if we only have
      relative word frequencies and no information about the orginal
      documents' text lengths. <br>
      - For this reason, and because segmentation cannot be recovered
      once a t-d-m has been calculated or a randomization of word order
      has been performed, I advocate for t-d-m representations that have
      some degree of segmentation built into them; that is the more
      powerful representation (but it is also, of course, the one more
      amenable to reconstruction, depending on the segment size); and
      for keeping absolute frequencies rather than calculating relative
      frequencies. <br>
      - There may of course be statistical descriptions of documents
      that cannot be used to generate a document with randomized word
      order; or there might be transformational derivatives that are not
      suitable for building a term-document matrix. But at least the
      t-d-m is a pretty standard form of text representation already
      (e.g., the stylo() package for R ships with several in-copyright
      corpora in the form of t-d-m. <br>
      <br>
    </p>
    Best wishes,<br>
    Christof <br>
    <p><br>
    </p>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 18.06.24 15:27, Genêt, Philippe
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:414dd2556aa643aeb37efcfb91905195@dnb.de">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <meta name="Generator"
        content="Microsoft Word 15 (filtered medium)">
      <style>@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}p.msonormal0, li.msonormal0, div.msonormal0
        {mso-style-name:msonormal;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:0cm;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}p.emailquote, li.emailquote, div.emailquote
        {mso-style-name:emailquote;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:1.0pt;
        border:none;
        padding:0cm;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}span.E-MailFormatvorlage19
        {mso-style-type:personal;
        font-family:"Verdana",sans-serif;
        color:#44546A;}span.E-MailFormatvorlage20
        {mso-style-type:personal-compose;
        font-family:"Verdana",sans-serif;
        color:windowtext;}.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}div.WordSection1
        {page:WordSection1;}</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
      <div class="WordSection1">
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif">Liebe
            Kolleg*innen,<o:p></o:p></span></p>
        <div>
          <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif"> <o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif">die
              nächste
              <span style="color:#44546A">Sitzung der AG Legal </span>findet
              statt am kommenden
              <b>Dienstag, den 2<span style="color:#44546A">5</span>. <span
                  style="color:#44546A">
                  Juni </span>2024, um 11 Uhr</b> in diesem Zoom-Raum:
              <a
href="https://zoom.us/j/93357206007?pwd=SVVDNFFyTTJkYmp3cGlKeElTS3JqUT09"
                moz-do-not-send="true">
                <span style="color:#0563C1">https://zoom.us/j/93357206007?pwd=SVVDNFFyTTJkYmp3cGlKeElTS3JqUT09</span></a>
              <o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif"> <o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:black">Die
              <a href="https://textplus.sync.academiccloud.de/f/780031"
                moz-do-not-send="true">Agenda</a></span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif">
              <span style="color:black">entspricht der vom letzten Mal:
              </span>das nächste Deliverable, dessen Deadline
              <span style="color:black">nun noch näher gerückt ist</span>.<o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif">Ergänzt
              wie immer gerne die Punkte, über die ihr noch sprechen
              wollt!<o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif"> <o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif">Bis
              dahin liebe Grüße<o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif">Philippe<o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif"> <o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif">--<br>
            </span><span
style="font-size:9.0pt;font-family:"Verdana",sans-serif">Philippe
              Genêt</span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span
style="font-size:9.0pt;font-family:"Verdana",sans-serif">Koordinator
              DNB@Text+</span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p></o:p></span></p>
        </div>
        <div style="margin-bottom:12.0pt">
          <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
            </span><span
style="font-size:9.0pt;font-family:"Verdana",sans-serif">Deutsche
              Nationalbibliothek</span><span style="font-size:9.0pt">
              <br>
            </span><span
style="font-size:9.0pt;font-family:"Verdana",sans-serif">Fachbereich
              Informationsinfrastruktur<br>
              Adickesallee 1</span><span style="font-size:9.0pt"> <br>
            </span><span
style="font-size:9.0pt;font-family:"Verdana",sans-serif">60322
              Frankfurt am Main</span><span style="font-size:9.0pt">
            </span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span
style="font-size:9.0pt;font-family:"Verdana",sans-serif">Telefon:
              +49 69 1525-1847</span><span style="font-size:9.0pt">
            </span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span
style="font-size:9.0pt;font-family:"Verdana",sans-serif">E-Mail:
              <a href="mailto:p.genet@dnb.de" moz-do-not-send="true"><span
                  style="color:#0563C1">p.genet@dnb.de</span></a>
            </span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span
style="font-size:9.0pt;font-family:"Verdana",sans-serif"> </span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"><a
                href="http://www.text-plus.org/" moz-do-not-send="true"><span
style="font-size:9.0pt;font-family:"Verdana",sans-serif;color:#0563C1">text-plus.org</span></a></span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p></o:p></span></p>
        </div>
        <div style="margin-bottom:12.0pt">
          <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"><a
                href="http://www.dnb.de/" moz-do-not-send="true"><span
style="font-size:9.0pt;font-family:"Verdana",sans-serif;color:#0563C1">dnb.de</span></a></span><span
style="font-size:9.0pt;font-family:"Verdana",sans-serif">
            </span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif"><o:p></o:p></span></p>
        </div>
      </div>
      <br>
      <fieldset class="moz-mime-attachment-header"></fieldset>
    </blockquote>
    <div class="moz-signature">-- <br>
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div class="moz-signature">
        <meta http-equiv="content-type"
          content="text/html; charset=UTF-8">
        <title></title>
        <div class="moz-signature"> <small>
            <p><b>  Prof. Dr. Christof Schöch</b> <br>
                Professor for Digital Humanities, FB II <br>
                Co-Director, Trier Center for Digital Humanities <br>
              <img moz-do-not-send="false"
                src="cid:part1.gQwXLIeQ.NPU5RXS9@uni-trier.de"
                alt="Trier University, Germany" width="190"> <br>
                <a href="https://dh.uni-trier.de"
                class="moz-txt-link-freetext">https://dh.uni-trier.de</a>
              <br>
                <a href="https://tcdh.uni-trier.de"
                class="moz-txt-link-freetext">https://tcdh.uni-trier.de</a></p>
          </small> </div>
      </div>
    </div>
  </body>
</html>