CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7AFK_1 2PDB_1 3HIY_1 Letter Amino acid
0 15 16 T Threonine
0 6 3 W Tryptophan
0 11 31 R Arginine
352 7 5 C Cysteine
0 13 13 Q Glutamine
0 6 13 M Methionine
0 15 13 N Asparagine
486 16 22 G Glycine
0 19 18 I Isoleucine
0 9 15 H Histidine
0 25 17 K Lycine
0 17 24 S Serine
0 10 22 F Phenylalanine
0 21 17 P Proline
0 11 12 Y Tyrosine
0 25 34 V Valine
389 19 24 A Alanine
0 15 21 D Aspartic acid
0 23 34 E Glutamic acid
0 33 30 L Leucine

7AFK_1|Chain A[auth 1]|16SrRNA (head domain of the 30S ribosome)|Escherichia coli (562)
>2PDB_1|Chain A|Aldose reductase|Homo sapiens (9606)
>3HIY_1|Chains A, B|Minor Editosome-Associated TUTase|Trypanosoma brucei (5691)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7AFK , Knot 283 1541 0.44 8 16 64
AAAUUGAAGAGUUUGAUCAUGGCUCAGAUUGAACGCUGGCGGCAGGCCUAACACAUGCAAGUCGAACGGUAACAGGAAGAAGCUUGCUUCUUUGCUGACGAGUGGCGGACGGGUGAGUAAUGUCUGGGAAACUGCCUGAUGGAGGGGGAUAACUACUGGAAACGGUAGCUAAUACCGCAUAACGUCGCAAGACCAAAGAGGGGGACCUUCGGGCCUCUUGCCAUCGGAUGUGCCCAGAUGGGAUUAGCUAGUAGGUGGGGUAACGGCUCACCUAGGCGACGAUCCCUAGCUGGUCUGAGAGGAUGACCAGCCACACUGGAACUGAGACACGGUCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUAUUGCACAAUGGGCGCAAGCCUGAUGCAGCCAUGCCGCGUGUAUGAAGAAGGCCUUCGGGUUGUAAAGUACUUUCAGCGGGGAGGAAGGGAGUAAAGUUAAUACCUUUGCUCAUUGACGUUACCCGCAGAAGAAGCACCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGUGCAAGCGUUAAUCGGAAUUACUGGGCGUAAAGCGCACGCAGGCGGUUUGUUAAGUCAGAUGUGAAAUCCCCGGGCUCAACCUGGGAACUGCAUCUGAUACUGGCAAGCUUGAGUCUCGUAGAGGGGGGUAGAAUUCCAGGUGUAGCGGUGAAAUGCGUAGAGAUCUGGAGGAAUACCGGUGGCGAAGGCGGCCCCCUGGACGAAGACUGACGCUCAGGUGCGAAAGCGUGGGGAGCAAACAGGAUUAGAUACCCUGGUAGUCCACGCCGUAAACGAUGUCGACUUGGAGGUUGUGCCCUUGAGGCGUGGCUUCCGGAGCUAACGCGUUAAGUCGACCGCCUGGGGAGUACGGCCGCAAGGUUAAAACUCAAAUGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAUGCAACGCGAAGAACCUUACCUGGUCUUGACAUCCACGGAAGUUUUCAGAGAUGAGAAUGUGCCUUCGGGAACCGUGAGACAGGUGCUGCAUGGCUGUCGUCAGCUCGUGUUGUGAAAUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCUUAUCCUUUGUUGCCAGCGGUCCGGCCGGGAACUCAAAGGAGACUGCCAGUGAUAAACUGGAGGAAGGUGGGGAUGACGUCAAGUCAUCAUGGCCCUUACGACCAGGGCUACACACGUGCUACAAUGGCGCAUACAAAGAGAAGCGACCUCGCGAGAGCAAGCGGACCUCAUAAAGUGCGUCGUAGUCCGGAUUGGAGUCUGCAACUCGACUCCAUGAAGUCGGAAUCGCUAGUAAUCGUGGAUCAGAAUGCCACGGUGAAUACGUUCCCGGCCUUGUACACACCGCCCGUCACACCAUGGGAGUGGGUUGCAAAAGAAGUAGGUAGCUUAACCUUCGGGAGGGCGCUUACCACUUUGUGAUUCAUGACUGGGGUGAAGUCGUAACAAGGUAACCGUAGGGGAACCUGCGGUUGGAUCACCUCCUUA
2PDB , Knot 141 316 0.85 40 209 304
MASRILLNNGAKMPILGLGTWKSPPGQVTEAVKVAIDVGYRHIDCAHVYQNENEVGVAIQEKLREQVVKREELFIVSKLWCTYHEKGLVKGACQKTLSDLKLDYLDLYLIHWPTGFKPGKEPFPLDESGNVVPSDTNILDTWAAMEELVDEGLVKAIGISNFNHLQVEMILNKPGLKYKPAVNQIECHPYLTQEKLIQYCQSKGIVVTAYSPLGSPDRPWAKPEDPSLLEDPRIKAIAAKHNKTTAQVLIRFPMQRNLVVIPKSVTPERIAENFKVFDFELSSQDMTTLLSYNRNWRVCALLSCTSHKDYPFHEEF
3HIY , Knot 164 384 0.84 40 225 373
MVAKREFIRGMMAHYRASLPPPEHSVVIHELQKRVLDIGMLAVNKAHVELFGSHVSGFCTPHSDADISLTYRNFSPWLQGMERVDEQNNKRMTRFGKEASAMGMEDVRYIRARIPVVQFTDGVTGIHCDVSIGNIGGVENSKILCAIRQVFPDFYGAYIHLVKAWGKAREVIAPERSTFNSFTVTTMALMVLQELGLLPVFSKPTGEFGELTVADAEMLLQEFKLPPIYDSLHDDDEKLGEAVFFCLQRFAEYYAKYDFSAGTVSLIHPRRHRTVYERVVRRHLELLGSRKRLEWEKHIAEHKEDGPLDENDFSASMQNETTQRPSNSPYVVEDFVNYVNCGRRVQASRVRHIQQEFNRLREMLIDKESELKFDEVFRESDTVP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7AFK_1)}(2) \setminus P_{f(2PDB_1)}(2)|=12\), \(|P_{f(2PDB_1)}(2) \setminus P_{f(7AFK_1)}(2)|=205\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11100111111000110010110001110011101001101101110001101010101110011101101101111111110001000000010011011101101110111011101101000111111001000110111111111011001001111101101100110100101011010010111100111111111110000011100000010010011101010001110111100110011011101111011011000100011101101100000110011000111111101100110010100111100111101011000111000001011111101101101111110100101011011101011100011010110010100101010101111111100000111001011110100000110111111111111110111100110100000100010011010010001011111111010011001100001010011011001011011010111111010111010011001111001001110101111010101011101100010011100111010111100000111000110001111100101000110100110111000111000010111111111011110000111010110110111101010111110001111111010011011011111011000000111011111001101000111010111110101111110111011110011101000011011000101001011101101001100011111001010000011110101100000111100110101001110011001000111111010110010111100111100011101110011011111000101011101101111010101100011000110101101011111100001000110000110100010111110000011111011111010100000111110010111101110100101011001001001100010100101111010011100111000010110111010110000010000001001001101100011001111100011111111001001101101110011111111101111101101001110010010110000010110011110010101010100101101101010101111111110110000101111101110111000010111101010010110001110011110001011000110000101111001111001001101100101110011110100101101110101000001100001010101001000100101001011111011100101111111101110110001100000111111101000100100001011000101100111101111001011011110110010111111100010110011100100000001
Pair \(Z_2\) Length of longest common subsequence
7AFK_1,2PDB_1 217 3
7AFK_1,3HIY_1 231 2
2PDB_1,3HIY_1 168 4

Newick tree

 
[
	7AFK_1:11.95,
	[
		2PDB_1:84,3HIY_1:84
	]:35.95
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1857 }{\log_{20} 1857}-\frac{316}{\log_{20}316})=390.\)
Status Protein1 Protein2 d d1/2
Query variables 7AFK_1 2PDB_1 283 211
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: