A statistical analysis of texts which could support common authorship or theme of documents.
Using standard Unix utilities, each text is first divided into sentences. Then each sentence is broken into N-grams -- sequences of N tokens for varying values of N. N-grams do not span sentence boundaries. For example, the simple text:
My car goes fast. My dog is brown.yields the 3-grams:
my car goes car goes fast my dog is dog is brown
Statistics are then collected for the N-gram set from each work, in comparison with the N-gram collection from the entire corpus. If one work has a lower redudancy among its N-grams, then it might be considered a richer language at that scale, at least in comparison to other works of the same approximate overall size.
The collection of unique N-grams for the entire corpus is used to calculate a "feature vector" for each work. If there were M unique N-grams in the entire corpus, then the feature vector for each text is a list of M floating-point values. The i'th such value in a work's feature vector is the percentage of N-grams for that work that are identical to the i'th N-gram in the overall list.
The "distance" between two works is then the Euclidian distance in that M-dimensional space. A distance of zero indicates that the two works have identical distributions of N-grams, and thus resemble each other in some statistical sense. It is tempting to say that a distance near zero indicates that the two texts are strongly correlated, but this is not necessarily true in general. This is, however, a useful qualitative measure -- works by the same author tend to have small inter-document distances.
Filename Title and Author ------------- --------------------------------------------------- 0hfinn.txt "The Adventures of Huckleberry Finn," by Mark Twain 0lmiss.txt "Life on the Mississippi", by Mark Twain 0tramp.txt "A Tramp Abroad", By Mark Twain, 1880 0yankee.txt "A Connecticut Yankee in King Arthur's Court", by Mark Twain 1emma.txt "Emma", by Jane Austen 1persua.txt "Persuasion", by Jane Austen (1818) 1pride.txt "Pride and Prejudice", by Jane Austen 1sense.txt "Sense and Sensibility", by Jane Austen (1811) 2gmars.txt "The Gods of Mars", by Edgar Rice Burroughs (1913) 2pmars.txt "A Princess of Mars", by Edgar Rice Burroughs 2tarzan.txt "Tarzan of the Apes", by Edgar Rice Burroughs 2timelf.txt "The Land That Time Forgot", by Edgar Rice Burroughs 3alice.txt "Alice's Adventures in Wonderland", by Lewis Carroll 3lglass.txt "Through the Looking Glass", by Lewis Carroll 3snark.txt The Hunting of the Snark", by Lewis Carroll 4agent.txt "The Secret Agent", by Joseph Conrad 4hdark.txt "Heart of Darkness", by Joseph Conrad 4sshar.txt "The Secret Sharer", by Joseph Conrad 5great.txt "Great Expectations", by Charles Dickens 5oliver.txt "Oliver Twist", by Charles Dickens 5pwprs.txt "The Pickwick Papers", by Charles Dickens 5twocity.txt "A Tale of Two Cities", Charles Dickens 6callw.txt "The Call of the Wild", by Jack London 6seawolf.txt "The Sea Wolf", by Jack London 6whtfng.txt "White Fang", by Jack London 7dmoro.txt "The Island of Doctor Moreau", by H. G. Wells 7time.txt "The Time Machine", by H(erbert) G(eorge) Wells [1898] 7warwrld.txt "The War of the Worlds", by H(erbert) G(eorge) Wells [1898] 8human.txt "Of Human Bondage", by Somerset Maugham 8moon.txt "Moon and Sixpence", by Somerset Maugham Pattrib.txt "Attributes of the Mujahideen: Compliance with the Sunnah", by Mufti Khubaib Sahib, http://www.ummah.net.pk/harkat/jihad/attribut.htm Pchechen.txt "Chechen-Russo Conflict", by Abdullah Khan, Feb 2000, footnotes removed, http://www.amina.com/article/chechenrus_confl.html Pcommun.txt "The Communist Manifesto", by Karl Marx and Friedrich Engels Pexpel.txt "Declaration Of War Against The Americans Occupying The Land Of The Two Holy Places", Usama bin Laden, from http://www.azzam.com/html/articlesdeclaration.htm Pfunda.txt "Fundamentalism", by Maulana Muhammad Masoud Azhar, http://www.ummah.net.pk/harkat/jihad/fundamen.htm Pjihad.txt "Jihad: The forgotten obligation", http://www.ummah.net.pk/harkat/jihad/o-jihad.htm Pkoran.txt "The Koran", M.H. Shakir translation Pmiscon.txt "7 Misconceptions In Fighting The Apostate Regime", Al-Jama'ah Al-Islamiyyah (Islamic Group) in Egypt, from http://www.azzam.com/html/articlesmisconceptions.htm Pshamyl.txt "The Jihad of Imam Shamyl", by Kerim Fenari, http://www.amina.com/article/jihad_imamshamyl.html Punabomb.txt "Unabomber's Manifesto", by Ted KaczinskiOverall Statistics
54056 unique 1-grams for the corpus 753606 unique 2-grams for the corpus 1854335 unique 3-grams for the corpus 2387282 unique 4-grams for the corpus
File Sentences Words Characters 0hfinn.txt 14131 112552 565633 0lmiss.txt 15530 144653 813040 0tramp.txt 18398 153328 857005 0yankee.txt 15136 120622 642558 1emma.txt 16770 158080 887254 1persua.txt 8468 83309 467136 1pride.txt 14179 121756 686896 1sense.txt 14748 118575 672750 2gmars.txt 10216 82691 452178 2pmars.txt 7787 65884 363652 2tarzan.txt 11858 85426 479989 2timelf.txt 3774 37179 201350 3alice.txt 3580 26439 147993 3lglass.txt 4153 29268 167712 3snark.txt 820 5090 29596 4agent.txt 10406 91233 521937 4hdark.txt 3695 38242 211905 4sshar.txt 1839 16648 89249 5great.txt 20868 184420 998394 5oliver.txt 19931 156996 891646 5pwprs.txt 38464 298162 1739725 5twocity.txt 16038 135711 759010 6callw.txt 3266 31811 179206 6seawolf.txt 12191 106259 578582 6whtfng.txt 7797 72225 399769 7dmoro.txt 4888 43420 241523 7time.txt 3381 32345 181680 7warwrld.txt 7248 60374 343012 8human.txt 28585 259330 1410243 8moon.txt 9418 75036 407226 Pattrib.txt 116 1170 7463 Pchechen.txt 529 5577 35624 Pcommun.txt 1766 11448 75591 Pexpel.txt 1155 11956 68982 Pfunda.txt 500 4711 27975 Pjihad.txt 3637 35149 205581 Pkoran.txt 24158 162467 888435 Pmiscon.txt 882 10052 54280 Pshamyl.txt 536 5207 31330 Punabomb.txt 3603 34384 220786For each file, for each value of N, below are:
0hfinn.txt has: 111451 1-grams, 6836 unique ( 93.866% redundant, 12.646% of corpus) 105161 2-grams, 43791 unique ( 58.358% redundant, 5.811% of corpus) 98871 3-grams, 79763 unique ( 19.326% redundant, 4.301% of corpus) 92829 4-grams, 88461 unique ( 4.705% redundant, 3.706% of corpus) 17 words/sentence, on average 0lmiss.txt has: 145265 1-grams, 12733 unique ( 91.235% redundant, 23.555% of corpus) 137786 2-grams, 71812 unique ( 47.881% redundant, 9.529% of corpus) 130307 3-grams, 113919 unique ( 12.576% redundant, 6.143% of corpus) 123089 4-grams, 119823 unique ( 2.653% redundant, 5.019% of corpus) 18 words/sentence, on average 0tramp.txt has: 154416 1-grams, 13431 unique ( 91.302% redundant, 24.846% of corpus) 146961 2-grams, 76231 unique ( 48.128% redundant, 10.115% of corpus) 139506 3-grams, 121791 unique ( 12.698% redundant, 6.568% of corpus) 132293 4-grams, 128790 unique ( 2.648% redundant, 5.395% of corpus) 19 words/sentence, on average 0yankee.txt has: 119092 1-grams, 11101 unique ( 90.679% redundant, 20.536% of corpus) 112731 2-grams, 58966 unique ( 47.693% redundant, 7.825% of corpus) 106370 3-grams, 93735 unique ( 11.878% redundant, 5.055% of corpus) 100245 4-grams, 98020 unique ( 2.220% redundant, 4.106% of corpus) 18 words/sentence, on average 1emma.txt has: 159674 1-grams, 7329 unique ( 95.410% redundant, 13.558% of corpus) 149636 2-grams, 61088 unique ( 59.176% redundant, 8.106% of corpus) 139598 3-grams, 112788 unique ( 19.205% redundant, 6.082% of corpus) 129953 4-grams, 124459 unique ( 4.228% redundant, 5.213% of corpus) 14 words/sentence, on average 1persua.txt has: 83414 1-grams, 5913 unique ( 92.911% redundant, 10.939% of corpus) 79618 2-grams, 39297 unique ( 50.643% redundant, 5.215% of corpus) 75822 3-grams, 65688 unique ( 13.366% redundant, 3.542% of corpus) 72117 4-grams, 70381 unique ( 2.407% redundant, 2.948% of corpus) 21 words/sentence, on average 1pride.txt has: 121296 1-grams, 6442 unique ( 94.689% redundant, 11.917% of corpus) 114404 2-grams, 50429 unique ( 55.920% redundant, 6.692% of corpus) 107512 3-grams, 90158 unique ( 16.141% redundant, 4.862% of corpus) 100836 4-grams, 97638 unique ( 3.171% redundant, 4.090% of corpus) 16 words/sentence, on average 1sense.txt has: 119273 1-grams, 6471 unique ( 94.575% redundant, 11.971% of corpus) 113507 2-grams, 50641 unique ( 55.385% redundant, 6.720% of corpus) 107741 3-grams, 90794 unique ( 15.729% redundant, 4.896% of corpus) 102178 4-grams, 99060 unique ( 3.052% redundant, 4.149% of corpus) 19 words/sentence, on average 2gmars.txt has: 82726 1-grams, 6970 unique ( 91.575% redundant, 12.894% of corpus) 78648 2-grams, 39857 unique ( 49.322% redundant, 5.289% of corpus) 74570 3-grams, 63623 unique ( 14.680% redundant, 3.431% of corpus) 70643 4-grams, 67822 unique ( 3.993% redundant, 2.841% of corpus) 19 words/sentence, on average 2pmars.txt has: 65910 1-grams, 6506 unique ( 90.129% redundant, 12.036% of corpus) 63536 2-grams, 34779 unique ( 45.261% redundant, 4.615% of corpus) 61162 3-grams, 53880 unique ( 11.906% redundant, 2.906% of corpus) 58818 4-grams, 57291 unique ( 2.596% redundant, 2.400% of corpus) 27 words/sentence, on average 2tarzan.txt has: 85627 1-grams, 7467 unique ( 91.280% redundant, 13.813% of corpus) 81127 2-grams, 42378 unique ( 47.763% redundant, 5.623% of corpus) 76627 3-grams, 66548 unique ( 13.153% redundant, 3.589% of corpus) 72319 4-grams, 70263 unique ( 2.843% redundant, 2.943% of corpus) 18 words/sentence, on average 2timelf.txt has: 37338 1-grams, 4848 unique ( 87.016% redundant, 8.968% of corpus) 35495 2-grams, 21277 unique ( 40.056% redundant, 2.823% of corpus) 33652 3-grams, 30494 unique ( 9.384% redundant, 1.644% of corpus) 31864 4-grams, 31252 unique ( 1.921% redundant, 1.309% of corpus) 19 words/sentence, on average 3alice.txt has: 26590 1-grams, 2641 unique ( 90.068% redundant, 4.886% of corpus) 24721 2-grams, 13256 unique ( 46.378% redundant, 1.759% of corpus) 22852 3-grams, 19515 unique ( 14.603% redundant, 1.052% of corpus) 21089 4-grams, 20150 unique ( 4.453% redundant, 0.844% of corpus) 13 words/sentence, on average 3lglass.txt has: 29454 1-grams, 2823 unique ( 90.416% redundant, 5.222% of corpus) 27122 2-grams, 14673 unique ( 45.900% redundant, 1.947% of corpus) 24790 3-grams, 21579 unique ( 12.953% redundant, 1.164% of corpus) 22563 4-grams, 21751 unique ( 3.599% redundant, 0.911% of corpus) 12 words/sentence, on average 3snark.txt has: 5063 1-grams, 1444 unique ( 71.479% redundant, 2.671% of corpus) 4749 2-grams, 3669 unique ( 22.742% redundant, 0.487% of corpus) 4435 3-grams, 4139 unique ( 6.674% redundant, 0.223% of corpus) 4128 4-grams, 3965 unique ( 3.949% redundant, 0.166% of corpus) 16 words/sentence, on average 4agent.txt has: 90144 1-grams, 9328 unique ( 89.652% redundant, 17.256% of corpus) 83967 2-grams, 45666 unique ( 45.614% redundant, 6.060% of corpus) 77790 3-grams, 67928 unique ( 12.678% redundant, 3.663% of corpus) 71866 4-grams, 69820 unique ( 2.847% redundant, 2.925% of corpus) 14 words/sentence, on average 4hdark.txt has: 38445 1-grams, 5531 unique ( 85.613% redundant, 10.232% of corpus) 35970 2-grams, 22440 unique ( 37.615% redundant, 2.978% of corpus) 33495 3-grams, 30797 unique ( 8.055% redundant, 1.661% of corpus) 31147 4-grams, 30702 unique ( 1.429% redundant, 1.286% of corpus) 14 words/sentence, on average 4sshar.txt has: 16655 1-grams, 2805 unique ( 83.158% redundant, 5.189% of corpus) 15488 2-grams, 10111 unique ( 34.717% redundant, 1.342% of corpus) 14321 3-grams, 13263 unique ( 7.388% redundant, 0.715% of corpus) 13211 4-grams, 13008 unique ( 1.537% redundant, 0.545% of corpus) 13 words/sentence, on average 5great.txt has: 185167 1-grams, 11334 unique ( 93.879% redundant, 20.967% of corpus) 174889 2-grams, 74397 unique ( 57.460% redundant, 9.872% of corpus) 164611 3-grams, 133273 unique ( 19.038% redundant, 7.187% of corpus) 154963 4-grams, 148124 unique ( 4.413% redundant, 6.205% of corpus) 17 words/sentence, on average 5oliver.txt has: 156779 1-grams, 10805 unique ( 93.108% redundant, 19.989% of corpus) 146469 2-grams, 70083 unique ( 52.152% redundant, 9.300% of corpus) 136159 3-grams, 116097 unique ( 14.734% redundant, 6.261% of corpus) 126666 4-grams, 122781 unique ( 3.067% redundant, 5.143% of corpus) 13 words/sentence, on average 5pwprs.txt has: 298167 1-grams, 16086 unique ( 94.605% redundant, 29.758% of corpus) 277637 2-grams, 118656 unique ( 57.262% redundant, 15.745% of corpus) 257107 3-grams, 207869 unique ( 19.151% redundant, 11.210% of corpus) 238576 4-grams, 226667 unique ( 4.992% redundant, 9.495% of corpus) 13 words/sentence, on average 5twocity.txt has: 135868 1-grams, 10177 unique ( 92.510% redundant, 18.827% of corpus) 127712 2-grams, 62334 unique ( 51.192% redundant, 8.271% of corpus) 119556 3-grams, 102571 unique ( 14.207% redundant, 5.531% of corpus) 111890 4-grams, 108314 unique ( 3.196% redundant, 4.537% of corpus) 15 words/sentence, on average 6callw.txt has: 31845 1-grams, 4843 unique ( 84.792% redundant, 8.959% of corpus) 30146 2-grams, 19062 unique ( 36.768% redundant, 2.529% of corpus) 28447 3-grams, 26010 unique ( 8.567% redundant, 1.403% of corpus) 26778 4-grams, 26264 unique ( 1.919% redundant, 1.100% of corpus) 18 words/sentence, on average 6seawolf.txt has: 105623 1-grams, 9477 unique ( 91.028% redundant, 17.532% of corpus) 98562 2-grams, 50245 unique ( 49.022% redundant, 6.667% of corpus) 91501 3-grams, 78736 unique ( 13.951% redundant, 4.246% of corpus) 84705 4-grams, 82109 unique ( 3.065% redundant, 3.439% of corpus) 14 words/sentence, on average 6whtfng.txt has: 71923 1-grams, 6996 unique ( 90.273% redundant, 12.942% of corpus) 67164 2-grams, 34886 unique ( 48.058% redundant, 4.629% of corpus) 62405 3-grams, 53318 unique ( 14.561% redundant, 2.875% of corpus) 57712 4-grams, 55792 unique ( 3.327% redundant, 2.337% of corpus) 15 words/sentence, on average 7dmoro.txt has: 43585 1-grams, 5436 unique ( 87.528% redundant, 10.056% of corpus) 40682 2-grams, 23482 unique ( 42.279% redundant, 3.116% of corpus) 37779 3-grams, 33791 unique ( 10.556% redundant, 1.822% of corpus) 34982 4-grams, 34148 unique ( 2.384% redundant, 1.430% of corpus) 14 words/sentence, on average 7time.txt has: 32457 1-grams, 4666 unique ( 85.624% redundant, 8.632% of corpus) 30491 2-grams, 18603 unique ( 38.989% redundant, 2.469% of corpus) 28525 3-grams, 25862 unique ( 9.336% redundant, 1.395% of corpus) 26586 4-grams, 26065 unique ( 1.960% redundant, 1.092% of corpus) 16 words/sentence, on average 7warwrld.txt has: 60568 1-grams, 7254 unique ( 88.023% redundant, 13.419% of corpus) 57250 2-grams, 32863 unique ( 42.597% redundant, 4.361% of corpus) 53932 3-grams, 48131 unique ( 10.756% redundant, 2.596% of corpus) 50736 4-grams, 49642 unique ( 2.156% redundant, 2.079% of corpus) 17 words/sentence, on average 8human.txt has: 259146 1-grams, 12549 unique ( 95.158% redundant, 23.215% of corpus) 241651 2-grams, 88956 unique ( 63.188% redundant, 11.804% of corpus) 224156 3-grams, 168036 unique ( 25.036% redundant, 9.062% of corpus) 207214 4-grams, 191862 unique ( 7.409% redundant, 8.037% of corpus) 14 words/sentence, on average 8moon.txt has: 74825 1-grams, 7076 unique ( 90.543% redundant, 13.090% of corpus) 69433 2-grams, 34896 unique ( 49.741% redundant, 4.631% of corpus) 64041 3-grams, 54685 unique ( 14.609% redundant, 2.949% of corpus) 58875 4-grams, 56847 unique ( 3.445% redundant, 2.381% of corpus) 13 words/sentence, on average Pattrib.txt has: 1190 1-grams, 405 unique ( 65.966% redundant, 0.749% of corpus) 1142 2-grams, 796 unique ( 30.298% redundant, 0.106% of corpus) 1094 3-grams, 894 unique ( 18.282% redundant, 0.048% of corpus) 1047 4-grams, 907 unique ( 13.372% redundant, 0.038% of corpus) 21 words/sentence, on average Pchechen.txt has: 5658 1-grams, 1498 unique ( 73.524% redundant, 2.771% of corpus) 5449 2-grams, 3980 unique ( 26.959% redundant, 0.528% of corpus) 5240 3-grams, 4903 unique ( 6.431% redundant, 0.264% of corpus) 5032 4-grams, 4953 unique ( 1.570% redundant, 0.207% of corpus) 26 words/sentence, on average Pcommun.txt has: 11426 1-grams, 2235 unique ( 80.439% redundant, 4.135% of corpus) 10939 2-grams, 7240 unique ( 33.815% redundant, 0.961% of corpus) 10452 3-grams, 9419 unique ( 9.883% redundant, 0.508% of corpus) 9971 4-grams, 9677 unique ( 2.949% redundant, 0.405% of corpus) 22 words/sentence, on average Pexpel.txt has: 12261 1-grams, 2193 unique ( 82.114% redundant, 4.057% of corpus) 11685 2-grams, 6915 unique ( 40.822% redundant, 0.918% of corpus) 11109 3-grams, 8944 unique ( 19.489% redundant, 0.482% of corpus) 10548 4-grams, 9189 unique ( 12.884% redundant, 0.385% of corpus) 20 words/sentence, on average Pfunda.txt has: 4800 1-grams, 1155 unique ( 75.938% redundant, 2.137% of corpus) 4550 2-grams, 3182 unique ( 30.066% redundant, 0.422% of corpus) 4300 3-grams, 3926 unique ( 8.698% redundant, 0.212% of corpus) 4054 4-grams, 3937 unique ( 2.886% redundant, 0.165% of corpus) 18 words/sentence, on average Pjihad.txt has: 36418 1-grams, 3734 unique ( 89.747% redundant, 6.908% of corpus) 34318 2-grams, 16551 unique ( 51.772% redundant, 2.196% of corpus) 32218 3-grams, 24842 unique ( 22.894% redundant, 1.340% of corpus) 30300 4-grams, 26600 unique ( 12.211% redundant, 1.114% of corpus) 16 words/sentence, on average Pkoran.txt has: 166010 1-grams, 5327 unique ( 96.791% redundant, 9.855% of corpus) 156660 2-grams, 39206 unique ( 74.974% redundant, 5.202% of corpus) 147310 3-grams, 81927 unique ( 44.385% redundant, 4.418% of corpus) 138588 4-grams, 103629 unique ( 25.225% redundant, 4.341% of corpus) 16 words/sentence, on average Pmiscon.txt has: 9890 1-grams, 1230 unique ( 87.563% redundant, 2.275% of corpus) 9460 2-grams, 4621 unique ( 51.152% redundant, 0.613% of corpus) 9030 3-grams, 6590 unique ( 27.021% redundant, 0.355% of corpus) 8617 4-grams, 7131 unique ( 17.245% redundant, 0.299% of corpus) 21 words/sentence, on average Pshamyl.txt has: 5210 1-grams, 1698 unique ( 67.409% redundant, 3.141% of corpus) 4944 2-grams, 4107 unique ( 16.930% redundant, 0.545% of corpus) 4678 3-grams, 4602 unique ( 1.625% redundant, 0.248% of corpus) 4413 4-grams, 4407 unique ( 0.136% redundant, 0.185% of corpus) 19 words/sentence, on average Punabomb.txt has: 34507 1-grams, 4140 unique ( 88.002% redundant, 7.659% of corpus) 32858 2-grams, 18676 unique ( 43.161% redundant, 2.478% of corpus) 31209 3-grams, 26776 unique ( 14.204% redundant, 1.444% of corpus) 29599 4-grams, 28017 unique ( 5.345% redundant, 1.174% of corpus) 17 words/sentence, on average
Close matches are indicated with red, and somewhat close matches with orange, with some arbitrary cut-offs for "close" and "somewhat close". Every fourth line is bold, to make the table a little easier to read. This uses a smaller font, but you will still need to maximize your browser to see much of these rather large tables. Note that the tables are symmetric, so only half of each one is calculated and displayed.
1-gram distances hfinn lmiss tramp yanke emma persu pride sense gmars pmars tarza timel alice lglas snark agent hdark sshar great olive pwprs twoci callw seawo whtfn dmoro time warwr human moon attri chech commu expel funda jihad koran misco shamy unabo 0hfinn --- 0hfinn 0lmiss 0.083 --- 0lmiss 0tramp 0.086 0.005 --- 0tramp 0yankee 0.048 0.019 0.019 --- 0yankee 1emma 0.133 0.114 0.109 0.085 --- 1emma 1persua 0.135 0.081 0.079 0.069 0.026 --- 1persua 1pride 0.146 0.105 0.102 0.084 0.018 0.020 --- 1pride 1sense 0.149 0.111 0.109 0.090 0.021 0.024 0.013 --- 1sense 2gmars 0.173 0.047 0.047 0.074 0.150 0.117 0.130 0.139 --- 2gmars 2pmars 0.146 0.045 0.046 0.060 0.131 0.101 0.112 0.120 0.012 --- 2pmars 2tarzan 0.170 0.045 0.048 0.079 0.150 0.100 0.125 0.138 0.051 0.061 --- 2tarzan 2timelf 0.103 0.038 0.036 0.041 0.120 0.097 0.111 0.118 0.026 0.022 0.070 --- 2timelf 3alice 0.125 0.077 0.076 0.084 0.115 0.104 0.114 0.117 0.110 0.111 0.105 0.095 --- 3alice 3lglass 0.113 0.085 0.085 0.086 0.112 0.111 0.115 0.118 0.126 0.125 0.122 0.105 0.013 --- 3lglass 3snark 0.140 0.051 0.049 0.071 0.158 0.127 0.147 0.157 0.091 0.102 0.065 0.091 0.097 0.103 --- 3snark 4agent 0.197 0.061 0.062 0.095 0.127 0.089 0.104 0.115 0.071 0.081 0.038 0.098 0.119 0.128 0.090 --- 4agent 4hdark 0.118 0.038 0.038 0.049 0.109 0.089 0.102 0.111 0.041 0.040 0.064 0.038 0.099 0.100 0.083 0.050 --- 4hdark 4sshar 0.129 0.069 0.069 0.072 0.125 0.115 0.119 0.128 0.056 0.051 0.100 0.048 0.122 0.119 0.110 0.087 0.022 --- 4sshar 5great 0.067 0.066 0.066 0.039 0.070 0.075 0.070 0.080 0.090 0.068 0.124 0.051 0.101 0.092 0.115 0.120 0.049 0.046 --- 5great 5oliver 0.115 0.030 0.031 0.048 0.109 0.077 0.093 0.105 0.061 0.062 0.032 0.063 0.073 0.079 0.051 0.042 0.050 0.077 0.071 --- 5oliver 5pwprs 0.139 0.043 0.045 0.066 0.129 0.099 0.113 0.130 0.071 0.073 0.051 0.078 0.090 0.098 0.071 0.050 0.061 0.089 0.086 0.018 --- 5pwprs 5twocity 0.102 0.022 0.023 0.031 0.084 0.057 0.068 0.079 0.052 0.050 0.035 0.050 0.074 0.079 0.053 0.041 0.039 0.063 0.050 0.014 0.030 --- 5twocity 6callw 0.145 0.058 0.063 0.082 0.193 0.132 0.168 0.179 0.110 0.108 0.040 0.107 0.140 0.156 0.081 0.080 0.101 0.139 0.146 0.053 0.075 0.057 --- 6callw 6seawolf 0.072 0.032 0.033 0.030 0.105 0.088 0.099 0.108 0.047 0.033 0.068 0.021 0.084 0.088 0.079 0.090 0.033 0.037 0.034 0.048 0.065 0.035 0.083 --- 6seawolf 6whtfng 0.161 0.069 0.073 0.092 0.180 0.124 0.154 0.167 0.112 0.115 0.036 0.114 0.145 0.158 0.087 0.064 0.094 0.129 0.144 0.054 0.079 0.055 0.022 0.089 --- 6whtfng 7dmoro 0.111 0.041 0.043 0.051 0.151 0.128 0.140 0.151 0.034 0.027 0.074 0.027 0.107 0.113 0.090 0.091 0.029 0.036 0.051 0.060 0.068 0.050 0.102 0.023 0.107 --- 7dmoro 7time 0.117 0.045 0.046 0.051 0.145 0.124 0.139 0.145 0.033 0.024 0.091 0.024 0.107 0.116 0.107 0.105 0.033 0.040 0.055 0.077 0.084 0.061 0.125 0.027 0.133 0.015 --- 7time 7warwrld 0.137 0.026 0.028 0.054 0.173 0.127 0.157 0.164 0.027 0.029 0.048 0.033 0.102 0.120 0.079 0.072 0.042 0.069 0.095 0.052 0.060 0.046 0.070 0.041 0.084 0.027 0.027 --- 7warwrld 8human 0.130 0.099 0.098 0.093 0.098 0.076 0.088 0.100 0.155 0.148 0.083 0.133 0.128 0.123 0.108 0.082 0.093 0.120 0.100 0.071 0.103 0.068 0.087 0.101 0.067 0.136 0.164 0.148 --- 8human 8moon 0.091 0.081 0.077 0.058 0.063 0.067 0.062 0.071 0.109 0.095 0.113 0.077 0.113 0.101 0.114 0.097 0.047 0.053 0.033 0.078 0.104 0.061 0.141 0.054 0.124 0.078 0.090 0.122 0.053 --- 8moon Pattrib 0.387 0.205 0.209 0.267 0.346 0.286 0.312 0.319 0.175 0.196 0.182 0.232 0.283 0.314 0.235 0.191 0.238 0.276 0.326 0.213 0.211 0.214 0.228 0.254 0.235 0.232 0.237 0.181 0.332 0.344 --- Pattrib Pchechen 0.294 0.125 0.141 0.193 0.309 0.239 0.276 0.283 0.138 0.152 0.125 0.169 0.205 0.238 0.162 0.166 0.191 0.218 0.261 0.149 0.155 0.154 0.143 0.177 0.158 0.173 0.178 0.119 0.257 0.280 0.193 --- Pchechen Pcommun 0.327 0.125 0.129 0.194 0.291 0.224 0.255 0.260 0.101 0.122 0.113 0.164 0.211 0.244 0.168 0.119 0.163 0.209 0.266 0.148 0.145 0.143 0.161 0.184 0.171 0.158 0.156 0.097 0.279 0.283 0.128 0.125 --- Pcommun Pexpel 0.265 0.100 0.108 0.159 0.267 0.206 0.235 0.247 0.091 0.105 0.095 0.135 0.176 0.208 0.138 0.129 0.155 0.191 0.225 0.117 0.120 0.116 0.127 0.144 0.144 0.131 0.138 0.080 0.249 0.256 0.111 0.090 0.062 --- Pexpel Pfunda 0.340 0.145 0.146 0.202 0.277 0.219 0.245 0.252 0.125 0.145 0.137 0.181 0.221 0.254 0.179 0.153 0.186 0.227 0.272 0.163 0.167 0.160 0.185 0.202 0.198 0.186 0.184 0.131 0.283 0.284 0.153 0.148 0.088 0.073 --- Pfunda Pjihad 0.268 0.108 0.115 0.162 0.251 0.204 0.224 0.236 0.110 0.129 0.107 0.150 0.182 0.204 0.126 0.128 0.150 0.181 0.214 0.119 0.127 0.119 0.144 0.153 0.150 0.142 0.156 0.111 0.222 0.225 0.150 0.113 0.113 0.057 0.093 --- Pjihad Pkoran 0.185 0.146 0.145 0.144 0.202 0.188 0.192 0.207 0.201 0.197 0.181 0.197 0.207 0.209 0.165 0.216 0.216 0.255 0.197 0.153 0.176 0.144 0.183 0.182 0.205 0.206 0.227 0.196 0.210 0.213 0.272 0.267 0.252 0.154 0.209 0.163 --- Pkoran Pmiscon 0.267 0.127 0.125 0.165 0.242 0.207 0.219 0.232 0.147 0.163 0.130 0.182 0.200 0.218 0.144 0.161 0.190 0.231 0.236 0.134 0.150 0.136 0.159 0.181 0.171 0.181 0.201 0.152 0.226 0.239 0.202 0.179 0.147 0.100 0.102 0.090 0.118 --- Pmiscon Pshamyl 0.250 0.085 0.089 0.146 0.256 0.182 0.220 0.230 0.085 0.101 0.054 0.122 0.158 0.186 0.106 0.092 0.125 0.160 0.210 0.082 0.093 0.094 0.075 0.126 0.086 0.119 0.132 0.068 0.180 0.214 0.166 0.071 0.094 0.075 0.117 0.109 0.236 0.153 --- Pshamyl Punabomb 0.268 0.106 0.102 0.143 0.190 0.154 0.171 0.175 0.124 0.145 0.131 0.160 0.185 0.203 0.129 0.124 0.154 0.192 0.210 0.130 0.141 0.121 0.181 0.181 0.186 0.177 0.176 0.135 0.218 0.209 0.188 0.164 0.111 0.121 0.108 0.115 0.184 0.114 0.146 --- Punabomb hfinn lmiss tramp yanke emma persu pride sense gmars pmars tarza timel alice lglas snark agent hdark sshar great olive pwprs twoci callw seawo whtfn dmoro time warwr human moon attri chech commu expel funda jihad koran misco shamy unabo 2-gram distances hfinn lmiss tramp yanke emma persu pride sense gmars pmars tarza timel alice lglas snark agent hdark sshar great olive pwprs twoci callw seawo whtfn dmoro time warwr human moon attri chech commu expel funda jihad koran misco shamy unabo 0hfinn --- 0hfinn 0lmiss 0.283 --- 0lmiss 0tramp 0.293 0.072 --- 0tramp 0yankee 0.226 0.125 0.106 --- 0yankee 1emma 0.466 0.339 0.310 0.301 --- 1emma 1persua 0.466 0.298 0.279 0.288 0.136 --- 1persua 1pride 0.466 0.312 0.285 0.291 0.098 0.137 --- 1pride 1sense 0.456 0.316 0.289 0.290 0.113 0.155 0.088 --- 1sense 2gmars 0.490 0.198 0.188 0.258 0.439 0.390 0.394 0.401 --- 2gmars 2pmars 0.476 0.209 0.198 0.254 0.427 0.386 0.380 0.388 0.084 --- 2pmars 2tarzan 0.475 0.198 0.193 0.267 0.395 0.329 0.346 0.369 0.166 0.201 --- 2tarzan 2timelf 0.410 0.195 0.183 0.222 0.410 0.370 0.383 0.388 0.137 0.154 0.213 --- 2timelf 3alice 0.496 0.455 0.437 0.422 0.469 0.468 0.470 0.461 0.542 0.550 0.509 0.518 --- 3alice 3lglass 0.439 0.423 0.408 0.386 0.452 0.457 0.455 0.447 0.535 0.538 0.502 0.505 0.238 --- 3lglass 3snark 0.663 0.560 0.549 0.572 0.632 0.624 0.610 0.619 0.627 0.631 0.591 0.628 0.682 0.668 --- 3snark 4agent 0.438 0.227 0.207 0.264 0.366 0.316 0.325 0.332 0.304 0.318 0.226 0.326 0.483 0.456 0.595 --- 4agent 4hdark 0.326 0.173 0.161 0.170 0.357 0.324 0.340 0.339 0.274 0.270 0.279 0.245 0.472 0.424 0.592 0.230 --- 4hdark 4sshar 0.381 0.246 0.243 0.242 0.405 0.381 0.385 0.386 0.308 0.303 0.334 0.283 0.528 0.480 0.625 0.287 0.194 --- 4sshar 5great 0.301 0.212 0.202 0.174 0.281 0.291 0.265 0.269 0.293 0.265 0.341 0.247 0.458 0.409 0.592 0.296 0.182 0.210 --- 5great 5oliver 0.392 0.214 0.201 0.244 0.322 0.295 0.283 0.299 0.295 0.309 0.221 0.311 0.384 0.380 0.547 0.223 0.252 0.310 0.232 --- 5oliver 5pwprs 0.414 0.211 0.198 0.260 0.358 0.329 0.317 0.336 0.283 0.297 0.223 0.310 0.410 0.413 0.558 0.227 0.265 0.322 0.254 0.096 --- 5pwprs 5twocity 0.360 0.149 0.130 0.176 0.268 0.239 0.227 0.242 0.239 0.249 0.186 0.256 0.423 0.399 0.529 0.184 0.197 0.259 0.174 0.126 0.139 --- 5twocity 6callw 0.424 0.232 0.230 0.279 0.460 0.382 0.412 0.424 0.317 0.338 0.216 0.338 0.538 0.514 0.587 0.258 0.276 0.346 0.357 0.260 0.271 0.218 --- 6callw 6seawolf 0.336 0.163 0.158 0.171 0.354 0.331 0.326 0.326 0.198 0.195 0.238 0.173 0.478 0.445 0.582 0.270 0.177 0.209 0.170 0.256 0.261 0.190 0.268 --- 6seawolf 6whtfng 0.451 0.271 0.270 0.305 0.459 0.386 0.418 0.433 0.353 0.376 0.242 0.366 0.554 0.541 0.613 0.278 0.303 0.369 0.377 0.293 0.306 0.247 0.188 0.305 --- 6whtfng 7dmoro 0.388 0.180 0.174 0.217 0.448 0.406 0.416 0.414 0.183 0.189 0.231 0.187 0.494 0.479 0.609 0.290 0.204 0.253 0.222 0.284 0.272 0.226 0.304 0.174 0.335 --- 7dmoro 7time 0.391 0.209 0.198 0.220 0.444 0.414 0.425 0.413 0.210 0.202 0.294 0.197 0.494 0.479 0.631 0.335 0.217 0.268 0.228 0.326 0.320 0.268 0.364 0.195 0.394 0.160 --- 7time 7warwrld 0.401 0.149 0.144 0.218 0.453 0.385 0.416 0.415 0.161 0.182 0.190 0.182 0.491 0.476 0.607 0.256 0.213 0.277 0.280 0.270 0.249 0.203 0.258 0.192 0.299 0.145 0.173 --- 7warwrld 8human 0.362 0.280 0.273 0.265 0.319 0.281 0.303 0.326 0.420 0.426 0.274 0.392 0.470 0.438 0.586 0.255 0.270 0.344 0.287 0.251 0.294 0.225 0.230 0.302 0.237 0.371 0.423 0.366 --- 8human 8moon 0.305 0.219 0.202 0.174 0.271 0.275 0.262 0.270 0.329 0.318 0.298 0.286 0.463 0.414 0.590 0.266 0.188 0.244 0.160 0.255 0.290 0.205 0.298 0.190 0.309 0.268 0.288 0.305 0.149 --- 8moon Pattrib 0.847 0.692 0.697 0.753 0.819 0.779 0.799 0.805 0.658 0.681 0.668 0.696 0.835 0.844 0.859 0.728 0.763 0.779 0.797 0.750 0.718 0.721 0.732 0.727 0.753 0.698 0.726 0.665 0.805 0.802 --- Pattrib Pchechen 0.707 0.455 0.489 0.565 0.689 0.627 0.651 0.657 0.483 0.502 0.487 0.527 0.723 0.731 0.760 0.555 0.586 0.614 0.639 0.569 0.542 0.529 0.542 0.549 0.570 0.530 0.565 0.477 0.638 0.637 0.750 --- Pchechen Pcommun 0.664 0.356 0.355 0.474 0.629 0.550 0.579 0.586 0.326 0.359 0.338 0.402 0.670 0.677 0.721 0.423 0.488 0.524 0.566 0.461 0.421 0.411 0.450 0.439 0.479 0.388 0.432 0.316 0.596 0.572 0.651 0.501 --- Pcommun Pexpel 0.654 0.349 0.357 0.467 0.619 0.557 0.576 0.588 0.317 0.356 0.332 0.396 0.656 0.675 0.714 0.447 0.501 0.527 0.568 0.470 0.423 0.407 0.454 0.430 0.480 0.381 0.429 0.312 0.589 0.568 0.598 0.480 0.308 --- Pexpel Pfunda 0.748 0.529 0.518 0.599 0.687 0.646 0.657 0.665 0.534 0.557 0.545 0.582 0.756 0.755 0.784 0.596 0.624 0.658 0.673 0.607 0.591 0.562 0.603 0.598 0.632 0.575 0.602 0.529 0.695 0.672 0.735 0.648 0.538 0.474 --- Pfunda Pjihad 0.688 0.511 0.503 0.567 0.658 0.634 0.627 0.640 0.546 0.566 0.550 0.585 0.731 0.722 0.749 0.580 0.592 0.632 0.617 0.573 0.566 0.527 0.587 0.576 0.607 0.567 0.593 0.535 0.643 0.621 0.746 0.663 0.588 0.493 0.580 --- Pjihad Pkoran 0.619 0.483 0.470 0.516 0.588 0.593 0.558 0.579 0.528 0.546 0.522 0.560 0.696 0.683 0.712 0.581 0.579 0.636 0.581 0.544 0.544 0.481 0.564 0.541 0.596 0.549 0.582 0.527 0.595 0.562 0.765 0.694 0.604 0.495 0.625 0.595 --- Pkoran Pmiscon 0.712 0.501 0.482 0.558 0.643 0.638 0.617 0.633 0.518 0.551 0.526 0.575 0.738 0.733 0.752 0.600 0.623 0.668 0.653 0.573 0.569 0.530 0.601 0.579 0.625 0.566 0.601 0.533 0.663 0.632 0.748 0.668 0.557 0.452 0.512 0.559 0.490 --- Pmiscon Pshamyl 0.644 0.457 0.450 0.524 0.655 0.589 0.605 0.610 0.483 0.497 0.464 0.524 0.699 0.688 0.722 0.493 0.521 0.564 0.583 0.501 0.494 0.464 0.465 0.512 0.513 0.512 0.537 0.459 0.574 0.580 0.793 0.466 0.543 0.539 0.645 0.656 0.680 0.664 --- Pshamyl Punabomb 0.642 0.373 0.360 0.450 0.534 0.510 0.508 0.515 0.406 0.432 0.416 0.457 0.657 0.647 0.687 0.463 0.510 0.552 0.553 0.465 0.453 0.420 0.511 0.475 0.541 0.462 0.488 0.410 0.588 0.547 0.715 0.555 0.399 0.404 0.539 0.564 0.571 0.506 0.594 --- Punabomb hfinn lmiss tramp yanke emma persu pride sense gmars pmars tarza timel alice lglas snark agent hdark sshar great olive pwprs twoci callw seawo whtfn dmoro time warwr human moon attri chech commu expel funda jihad koran misco shamy unabo 3-gram distances hfinn lmiss tramp yanke emma persu pride sense gmars pmars tarza timel alice lglas snark agent hdark sshar great olive pwprs twoci callw seawo whtfn dmoro time warwr human moon attri chech commu expel funda jihad koran misco shamy unabo 0hfinn --- 0hfinn 0lmiss 0.714 --- 0lmiss 0tramp 0.736 0.643 --- 0tramp 0yankee 0.708 0.678 0.659 --- 0yankee 1emma 0.863 0.768 0.725 0.750 --- 1emma 1persua 0.882 0.803 0.776 0.792 0.548 --- 1persua 1pride 0.876 0.790 0.757 0.784 0.502 0.607 --- 1pride 1sense 0.871 0.795 0.761 0.784 0.524 0.629 0.541 --- 1sense 2gmars 0.913 0.818 0.796 0.827 0.836 0.866 0.847 0.852 --- 2gmars 2pmars 0.907 0.812 0.794 0.824 0.831 0.866 0.839 0.848 0.691 --- 2pmars 2tarzan 0.895 0.824 0.812 0.837 0.801 0.830 0.817 0.834 0.807 0.815 --- 2tarzan 2timelf 0.868 0.809 0.792 0.809 0.845 0.867 0.855 0.862 0.776 0.793 0.833 --- 2timelf 3alice 0.876 0.870 0.867 0.858 0.858 0.877 0.871 0.869 0.927 0.925 0.914 0.917 --- 3alice 3lglass 0.864 0.861 0.858 0.850 0.859 0.879 0.876 0.875 0.925 0.928 0.908 0.912 0.771 --- 3lglass 3snark 0.980 0.973 0.970 0.972 0.966 0.972 0.967 0.967 0.981 0.983 0.978 0.983 0.984 0.983 --- 3snark 4agent 0.864 0.807 0.793 0.809 0.790 0.811 0.813 0.816 0.876 0.885 0.842 0.885 0.891 0.880 0.975 --- 4agent 4hdark 0.837 0.790 0.785 0.792 0.835 0.860 0.855 0.859 0.858 0.863 0.879 0.856 0.909 0.897 0.979 0.829 --- 4hdark 4sshar 0.885 0.844 0.842 0.841 0.864 0.887 0.884 0.884 0.885 0.894 0.905 0.883 0.928 0.918 0.982 0.856 0.841 --- 4sshar 5great 0.755 0.692 0.681 0.683 0.686 0.744 0.713 0.724 0.770 0.764 0.817 0.770 0.846 0.827 0.966 0.771 0.757 0.787 --- 5great 5oliver 0.815 0.762 0.758 0.779 0.737 0.775 0.751 0.762 0.845 0.854 0.800 0.861 0.860 0.846 0.971 0.785 0.844 0.868 0.677 --- 5oliver 5pwprs 0.799 0.741 0.735 0.762 0.722 0.765 0.736 0.753 0.840 0.846 0.803 0.853 0.849 0.842 0.970 0.777 0.831 0.866 0.669 0.557 --- 5pwprs 5twocity 0.824 0.747 0.735 0.746 0.718 0.761 0.741 0.761 0.833 0.834 0.801 0.842 0.871 0.858 0.969 0.782 0.821 0.858 0.650 0.691 0.686 --- 5twocity 6callw 0.893 0.850 0.846 0.854 0.888 0.890 0.896 0.898 0.903 0.913 0.857 0.901 0.940 0.931 0.979 0.868 0.895 0.921 0.864 0.865 0.861 0.853 --- 6callw 6seawolf 0.826 0.744 0.731 0.733 0.771 0.814 0.797 0.800 0.773 0.783 0.816 0.767 0.888 0.867 0.973 0.816 0.792 0.820 0.675 0.790 0.780 0.764 0.831 --- 6seawolf 6whtfng 0.855 0.796 0.789 0.789 0.830 0.835 0.849 0.850 0.856 0.875 0.793 0.859 0.908 0.898 0.973 0.804 0.839 0.880 0.801 0.807 0.802 0.789 0.752 0.764 --- 6whtfng 7dmoro 0.857 0.811 0.809 0.818 0.869 0.894 0.881 0.880 0.835 0.844 0.869 0.841 0.913 0.906 0.979 0.876 0.830 0.875 0.764 0.849 0.843 0.833 0.907 0.788 0.865 --- 7dmoro 7time 0.884 0.838 0.826 0.832 0.879 0.899 0.891 0.887 0.847 0.849 0.900 0.852 0.927 0.918 0.986 0.898 0.846 0.890 0.783 0.878 0.874 0.862 0.927 0.816 0.892 0.812 --- 7time 7warwrld 0.846 0.782 0.778 0.795 0.857 0.872 0.867 0.866 0.822 0.832 0.851 0.828 0.902 0.891 0.978 0.847 0.821 0.874 0.768 0.829 0.817 0.807 0.881 0.783 0.826 0.780 0.806 --- 7warwrld 8human 0.757 0.713 0.694 0.707 0.678 0.707 0.713 0.733 0.851 0.854 0.728 0.832 0.834 0.820 0.963 0.693 0.774 0.838 0.664 0.687 0.683 0.692 0.765 0.733 0.652 0.828 0.865 0.800 --- 8human 8moon 0.811 0.740 0.719 0.726 0.689 0.766 0.733 0.750 0.806 0.806 0.800 0.812 0.878 0.862 0.974 0.786 0.781 0.827 0.647 0.765 0.760 0.751 0.858 0.720 0.786 0.797 0.821 0.804 0.564 --- 8moon Pattrib 0.994 0.992 0.994 0.993 0.993 0.992 0.994 0.995 0.997 0.996 0.998 0.996 0.997 0.995 0.999 0.995 0.994 0.998 0.993 0.996 0.993 0.996 0.996 0.994 0.996 0.995 0.997 0.995 0.996 0.998 --- Pattrib Pchechen 0.978 0.959 0.966 0.970 0.974 0.976 0.976 0.974 0.980 0.970 0.979 0.978 0.983 0.986 0.996 0.979 0.983 0.989 0.975 0.976 0.972 0.974 0.980 0.975 0.973 0.983 0.985 0.976 0.968 0.979 0.999 --- Pchechen Pcommun 0.981 0.955 0.957 0.962 0.968 0.967 0.967 0.966 0.967 0.967 0.973 0.977 0.986 0.983 0.995 0.963 0.977 0.983 0.967 0.964 0.958 0.961 0.977 0.968 0.967 0.978 0.976 0.964 0.974 0.973 0.999 0.987 --- Pcommun Pexpel 0.975 0.946 0.945 0.953 0.956 0.963 0.962 0.960 0.952 0.965 0.971 0.976 0.984 0.983 0.996 0.967 0.972 0.982 0.962 0.958 0.958 0.953 0.973 0.963 0.972 0.967 0.973 0.961 0.972 0.968 0.994 0.974 0.976 --- Pexpel Pfunda 0.980 0.964 0.963 0.967 0.970 0.975 0.974 0.973 0.978 0.980 0.981 0.981 0.986 0.986 0.996 0.974 0.978 0.988 0.974 0.974 0.972 0.970 0.978 0.974 0.975 0.980 0.984 0.974 0.977 0.979 0.991 0.987 0.986 0.954 --- Pfunda Pjihad 0.979 0.963 0.959 0.966 0.964 0.972 0.967 0.969 0.970 0.973 0.973 0.978 0.987 0.986 0.996 0.975 0.978 0.985 0.966 0.970 0.970 0.964 0.982 0.971 0.974 0.979 0.982 0.978 0.967 0.968 0.991 0.989 0.991 0.952 0.957 --- Pjihad Pkoran 0.958 0.922 0.914 0.925 0.910 0.933 0.915 0.927 0.941 0.941 0.943 0.956 0.975 0.973 0.991 0.957 0.958 0.975 0.928 0.934 0.936 0.914 0.962 0.941 0.955 0.959 0.968 0.956 0.941 0.931 0.997 0.989 0.984 0.935 0.965 0.944 --- Pkoran Pmiscon 0.977 0.952 0.931 0.955 0.948 0.961 0.954 0.957 0.963 0.969 0.972 0.976 0.985 0.987 0.997 0.975 0.978 0.988 0.967 0.966 0.966 0.953 0.982 0.967 0.974 0.978 0.983 0.976 0.974 0.967 0.998 0.993 0.984 0.908 0.921 0.943 0.903 --- Pmiscon Pshamyl 0.980 0.966 0.969 0.971 0.979 0.977 0.977 0.978 0.972 0.975 0.969 0.976 0.987 0.987 0.997 0.976 0.979 0.987 0.975 0.972 0.972 0.970 0.972 0.974 0.966 0.981 0.984 0.975 0.969 0.979 0.998 0.960 0.988 0.988 0.990 0.991 0.994 0.995 --- Pshamyl Punabomb 0.962 0.922 0.913 0.931 0.920 0.936 0.928 0.929 0.953 0.955 0.957 0.960 0.974 0.968 0.993 0.945 0.963 0.976 0.947 0.942 0.940 0.938 0.972 0.949 0.960 0.965 0.966 0.945 0.944 0.945 0.998 0.978 0.948 0.961 0.969 0.977 0.958 0.945 0.989 --- Punabomb hfinn lmiss tramp yanke emma persu pride sense gmars pmars tarza timel alice lglas snark agent hdark sshar great olive pwprs twoci callw seawo whtfn dmoro time warwr human moon attri chech commu expel funda jihad koran misco shamy unabo 4-gram distances hfinn lmiss tramp yanke emma persu pride sense gmars pmars tarza timel alice lglas snark agent hdark sshar great olive pwprs twoci callw seawo whtfn dmoro time warwr human moon attri chech commu expel funda jihad koran misco shamy unabo 0hfinn --- 0hfinn 0lmiss 0.957 --- 0lmiss 0tramp 0.964 0.948 --- 0tramp 0yankee 0.962 0.959 0.956 --- 0yankee 1emma 0.985 0.972 0.964 0.974 --- 1emma 1persua 0.990 0.980 0.976 0.982 0.930 --- 1persua 1pride 0.987 0.975 0.971 0.978 0.914 0.946 --- 1pride 1sense 0.986 0.974 0.969 0.975 0.917 0.949 0.920 --- 1sense 2gmars 0.992 0.978 0.975 0.983 0.981 0.988 0.983 0.983 --- 2gmars 2pmars 0.991 0.977 0.975 0.983 0.981 0.989 0.982 0.983 0.943 --- 2pmars 2tarzan 0.989 0.980 0.978 0.984 0.977 0.986 0.980 0.982 0.968 0.969 --- 2tarzan 2timelf 0.988 0.979 0.976 0.982 0.985 0.990 0.986 0.985 0.965 0.974 0.977 --- 2timelf 3alice 0.984 0.984 0.985 0.985 0.983 0.987 0.983 0.982 0.992 0.993 0.990 0.994 --- 3alice 3lglass 0.985 0.983 0.983 0.985 0.983 0.988 0.984 0.985 0.992 0.992 0.989 0.991 0.943 --- 3lglass 3snark 0.998 0.997 0.995 0.996 0.998 0.998 0.997 0.998 0.998 0.998 0.998 0.999 0.998 0.998 --- 3snark 4agent 0.981 0.971 0.969 0.978 0.973 0.980 0.977 0.975 0.981 0.983 0.978 0.985 0.985 0.982 0.997 --- 4agent 4hdark 0.983 0.978 0.977 0.980 0.984 0.989 0.985 0.985 0.985 0.986 0.988 0.987 0.992 0.991 0.998 0.976 --- 4hdark 4sshar 0.989 0.982 0.983 0.986 0.987 0.993 0.989 0.989 0.988 0.990 0.990 0.989 0.992 0.990 0.999 0.979 0.985 --- 4sshar 5great 0.973 0.963 0.961 0.967 0.958 0.973 0.964 0.965 0.976 0.976 0.982 0.978 0.979 0.979 0.998 0.965 0.977 0.977 --- 5great 5oliver 0.979 0.973 0.972 0.978 0.966 0.974 0.967 0.967 0.979 0.982 0.977 0.985 0.979 0.976 0.996 0.967 0.984 0.986 0.950 --- 5oliver 5pwprs 0.971 0.962 0.961 0.970 0.951 0.963 0.953 0.956 0.971 0.974 0.972 0.979 0.974 0.972 0.994 0.956 0.977 0.980 0.938 0.904 --- 5pwprs 5twocity 0.986 0.976 0.975 0.979 0.971 0.981 0.975 0.977 0.985 0.983 0.982 0.988 0.987 0.986 0.997 0.972 0.987 0.987 0.954 0.963 0.954 --- 5twocity 6callw 0.989 0.981 0.982 0.985 0.991 0.992 0.989 0.987 0.988 0.989 0.986 0.990 0.995 0.994 0.999 0.983 0.990 0.992 0.990 0.987 0.979 0.991 --- 6callw 6seawolf 0.983 0.969 0.965 0.972 0.974 0.983 0.976 0.974 0.968 0.973 0.977 0.971 0.988 0.984 0.996 0.972 0.978 0.981 0.962 0.973 0.963 0.978 0.978 --- 6seawolf 6whtfng 0.986 0.973 0.972 0.978 0.982 0.984 0.982 0.979 0.974 0.979 0.973 0.981 0.991 0.987 0.997 0.968 0.980 0.988 0.978 0.977 0.967 0.981 0.964 0.958 --- 6whtfng 7dmoro 0.986 0.982 0.981 0.986 0.986 0.993 0.989 0.988 0.984 0.985 0.986 0.984 0.992 0.990 0.998 0.986 0.984 0.991 0.977 0.985 0.980 0.987 0.994 0.980 0.985 --- 7dmoro 7time 0.990 0.982 0.981 0.985 0.988 0.992 0.988 0.988 0.982 0.985 0.990 0.985 0.993 0.992 0.999 0.988 0.985 0.992 0.978 0.985 0.983 0.988 0.994 0.981 0.989 0.975 --- 7time 7warwrld 0.987 0.978 0.975 0.982 0.986 0.990 0.987 0.985 0.977 0.979 0.980 0.980 0.991 0.989 0.999 0.980 0.983 0.989 0.977 0.981 0.975 0.983 0.988 0.973 0.978 0.974 0.974 --- 7warwrld 8human 0.961 0.954 0.950 0.957 0.941 0.957 0.952 0.951 0.975 0.977 0.959 0.976 0.966 0.964 0.991 0.937 0.967 0.973 0.943 0.946 0.933 0.958 0.962 0.953 0.934 0.977 0.981 0.969 --- 8human 8moon 0.981 0.970 0.967 0.974 0.949 0.975 0.964 0.967 0.976 0.976 0.974 0.980 0.985 0.983 0.997 0.972 0.977 0.984 0.961 0.972 0.966 0.978 0.988 0.968 0.978 0.979 0.978 0.976 0.906 --- 8moon Pattrib 1.000 0.999 0.999 0.999 1.000 1.000 0.999 0.999 1.000 0.999 0.999 1.000 1.000 1.000 1.000 0.999 1.000 1.000 1.000 0.999 0.997 1.000 1.000 1.001 1.000 1.000 1.000 1.000 0.997 0.999 --- Pattrib Pchechen 0.998 0.995 0.995 0.996 0.998 0.999 0.997 0.997 0.998 0.996 0.997 0.998 0.999 1.000 1.000 0.997 0.999 0.999 0.999 0.997 0.995 1.000 0.997 0.998 0.996 1.000 0.999 0.998 0.991 0.998 1.000 --- Pchechen Pcommun 0.999 0.994 0.994 0.994 0.997 0.997 0.995 0.995 0.996 0.994 0.996 0.996 1.000 0.999 1.000 0.995 0.997 0.998 0.998 0.995 0.992 0.997 0.998 0.997 0.994 0.998 0.998 0.997 0.994 0.997 1.000 0.998 --- Pcommun Pexpel 0.998 0.994 0.994 0.995 0.998 0.998 0.997 0.996 0.990 0.996 0.996 0.999 0.999 0.999 1.000 0.995 0.998 0.998 0.998 0.996 0.993 0.998 0.997 0.998 0.998 0.999 0.997 0.998 0.994 0.998 0.999 0.997 0.996 --- Pexpel Pfunda 0.999 0.997 0.996 0.995 0.997 0.998 0.996 0.995 0.999 0.997 0.998 0.998 0.999 0.999 0.999 0.997 0.998 0.999 0.999 0.997 0.994 0.998 0.997 0.998 0.996 0.999 0.999 0.998 0.994 0.998 0.999 0.998 0.998 0.995 --- Pfunda Pjihad 0.998 0.996 0.995 0.996 0.997 0.999 0.997 0.997 0.996 0.997 0.996 0.997 0.999 0.999 1.000 0.996 0.998 0.998 0.997 0.997 0.995 0.998 0.998 0.998 0.997 0.999 0.998 0.999 0.994 0.997 0.998 0.998 0.999 0.987 0.986 --- Pjihad Pkoran 0.997 0.992 0.992 0.992 0.992 0.995 0.992 0.993 0.995 0.993 0.994 0.997 0.999 0.999 1.000 0.994 0.996 0.997 0.994 0.992 0.991 0.992 0.998 0.996 0.996 0.998 0.998 0.997 0.990 0.994 1.000 1.000 0.999 0.987 0.994 0.988 --- Pkoran Pmiscon 0.999 0.996 0.994 0.995 0.997 0.997 0.995 0.996 0.997 0.998 0.997 0.997 0.999 0.999 1.000 0.998 0.998 0.999 0.998 0.997 0.994 0.998 0.998 0.997 0.998 0.999 0.999 0.998 0.994 0.998 1.000 1.000 0.998 0.972 0.990 0.985 0.979 --- Pmiscon Pshamyl 0.999 0.997 0.997 0.997 0.999 0.999 0.997 0.998 0.997 0.997 0.997 0.998 0.999 1.000 1.000 0.998 0.999 0.999 0.999 0.997 0.994 0.998 0.998 0.999 0.997 0.999 0.999 0.998 0.994 0.998 1.000 0.986 0.998 0.999 0.999 0.999 1.000 0.999 --- Pshamyl Punabomb 0.997 0.992 0.992 0.992 0.991 0.995 0.992 0.991 0.996 0.995 0.995 0.996 0.998 0.998 1.000 0.993 0.996 0.999 0.996 0.994 0.991 0.996 0.997 0.996 0.995 0.998 0.997 0.995 0.989 0.993 1.000 0.998 0.993 0.995 0.994 0.998 0.998 0.997 1.000 --- Punabomb hfinn lmiss tramp yanke emma persu pride sense gmars pmars tarza timel alice lglas snark agent hdark sshar great olive pwprs twoci callw seawo whtfn dmoro time warwr human moon attri chech commu expel funda jihad koran misco shamy unabo
As expected, for longer sequences, works resemble each other less and less, and the "distances" increase until each work is basically isolated as a statistically unique work.
This technique does yield relatively small "distances" for works by the same author, even when analyzed at different scales (N-grams of varying length). Note the relatively small distances between works by the same author. The file names were designed to cluster similar works together. It also can yield small distances for works on the same topic -- "War of the Worlds" and Burroughs' two books about Mars are relatively close.
Disappointingly, political writings did not appear to be particularly close to each other. The Koran was included in the corpus as it was expected that some political works would either quote from it or attempt similar phrasing. However, it is inappropriate to assign too much meaning to any statistics involving the political texts, since they are the shortest ones in the corpus.
More importantly, many of the political works are translations rather than the original texts. This technique should be applicable to texts in other languages. Word ordering is more free in some languages than it is in English, and in others it may be less free. This makes it hard to predict just how useful N-gram statistical analysis of non-English texts would be. Also, the corpus should include the works of interest in their original language(s), plus other works in the same language(s).
Each report lists the 100 most common N-grams and the number of their occurances. N-grams occuring only once are not reported.
/bin/vi