CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8OHD_1 5VFW_1 4RFE_1 Letter Amino acid
0 0 6 H Histidine
0 1 0 M Methionine
0 1 38 S Serine
0 1 23 T Threonine
0 5 8 E Glutamic acid
0 3 5 Q Glutamine
0 2 11 K Lycine
0 2 5 F Phenylalanine
0 0 14 P Proline
0 1 6 W Tryptophan
0 1 11 Y Tyrosine
0 3 25 V Valine
1695 0 5 C Cysteine
0 0 7 R Arginine
0 1 8 N Asparagine
0 0 5 D Aspartic acid
1823 0 22 G Glycine
0 1 20 L Leucine
798 2 10 A Alanine
0 1 4 I Isoleucine

8OHD_1|Chain A[auth 5]|28S rRNA|Homo sapiens (9606)
>5VFW_1|Chain A|Annexin A1|Homo sapiens (9606)
>4RFE_1|Chains A, C, E, G[auth H]|Fab heavy chain of ADCC-potent anti-HIV-1 antibody JR4|Macaca mulatta (9544)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8OHD , Knot 747 5070 0.41 8 16 64
CGCGACCUCAGAUCAGACGUGGCGACCCGCUGAAUUUAAGCAUAUUAGUCAGCGGAGGAGAAGAAACUAACCAGGAUUCCCUCAGUAACGGCGAGUGAACAGGGAAGAGCCCAGCGCCGAAUCCCCGCCCCGCGGCGGGGCGCGGGACAUGUGGCGUACGGAAGACCCGCUCCCCGGCGCCGCUCGUGGGGGGCCCAAGUCCUUCUGAUCGAGGCCCAGCCCGUGGACGGUGUGAGGCCGGUAGCGGCCCCCGGCGCGCCGGGCCCGGGUCUUCCCGGAGUCGGGUUGCUUGGGAAUGCAGCCCAAAGCGGGUGGUAAACUCCAUCUAAGGCUAAAUACCGGCACGAGACCGAUAGUCAACAAGUACCGUAAGGGAAAGUUGAAAAGAACUUUGAAGAGAGAGUUCAAGAGGGCGUGAAACCGUUAAGAGGUAAACGGGUGGGGUCCGCGCAGUCCGCCCGGAGGAUUCAACCCGGCGGCGGGUCCGGCCGUGUCGGCGGCCCGGCGGAUCUUUCCCGCCCCCCGUUCCUCCCGACCCCUCCACCCGCCCUCCCUUCCCCCGCCGCCCCUCCUCCUCCUCCCCGGAGGGGGCGGGCUCCGGCGGGUGCGGGGGUGGGCGGGCGGGGCCGGGGGUGGGGUCGGCGGGGGACCGUCCCCCGACCGGCGACCGGCCGCCGCCGGGCGCAUUUCCACCGCGGCGGUGCGCCGCGACCGGCUCCGGGACGGCUGGGAAGGCCCGGCGGGGAAGGUGGCUCGGGGGGCCCCGUCCGUCCGUCCGUCCGUCCUCCUCCUCCCCCGUCUCCGCCCCCCGGCCCCGCGUCCUCCCUCGGGAGGGCGCGCGGGUCGGGGCGGCGGCGGCGGCGGCGGUGGCGGCGGCGGCGGCGGCGGCGGGACCGAAACCCCCCCCGAGUGUUACAGCCCCCCCGGCAGCAGCACUCGCCGAAUCCCGGGGCCGAGGGAGCGAGACCCGUCGCCGCGCUCUCCCCCCUCCCGGCGCCCACCCCCGCGGGGAAUCCCCCGCGAGGGGGGUCUCCCCCGCGGGGGCGCGCCGGCGUCUCCUCGUGGGGGGGCCGGGCCACCCCUCCCACGGCGCGACCGCUCUCCCACCCCUCCUCCCCGCGCCCCCGCCCCGGCGACGGGGGGGGUGCCGCGCGCGGGUCGGGGGGCGGGGCGGACUGUCCCCAGUGCGCCCCGGGCGGGUCGCGCCGUCGGGCCCGGGGGAGGUUCUCUCGGGGCCACGCGCGCGUCCCCCGAAGAGGGGGACGGCGGAGCGAGCGCACGGGGUCGGCGGCGACGUCGGCUACCCACCCGACCCGUCUUGAAACACGGACCAAGGAGUCUAACACGUGCGCGAGUCGGGGGCUCGCACGAAAGCCGCCGUGGCGCAAUGAAGGUGAAGGCCGGCGCGCUCGCCGGCCGAGGUGGGAUCCCGAGGCCUCUCCAGUCCGCCGAGGGCGCACCACCGGCCCGUCUCGCCCGCCGCGCCGGGGAGGUGGAGCACGAGCGCACGUGUUAGGACCCGAAAGAUGGUGAACUAUGCCUGGGCAGGGCGAAGCCAGAGGAAACUCUGGUGGAGGUCCGUAGCGGUCCUGACGUGCAAAUCGGUCGUCCGACCUGGGUAUAGGGGCGAAAGACUAAUCGAACCAUCUAGUAGCUGGUUCCCUCCGAAGUUUCCCUCAGGAUAGCUGGCGCUCUCGCAGACCCGACGCACCCCCGCCACGCAGUUUUAUCCGGUAAAGCGAAUGAUUAGAGGUCUUGGGGCCGAAACGAUCUCAACCUAUUCUCAAACUUUAAAUGGGUAAGAAGCCCGGCUCGCUGGCGUGGAGCCGGGCGUGGAAUGCGAGUGCCUAGUGGGCCACUUUUGGUAAGCAGAACUGGCGCUGCGGGAUGAACCGAACGCCGGGUUAAGGCGCCCGAUGCCGACGCUCAUCAGACCCCAGAAAAGGUGUUGGUUGAUAUAGACAGCAGGACGGUGGCCAUGGAAGUCGGAAUCCGCUAAGGAGUGUGUAACAACUCACCUGCCGAAUCAACUAGCCCUGAAAAUGGAUGGCGCUGGAGCGUCGGGCCCAUACCCGGCCGUCGCCGGCAGUCGAGAGUGGACGGGAGCGGCGGCGGCGGCGCGCGCGCGCGCGCGUGUGGUGUGCGUCGGAGGGCGGCGGCGGCGGCGGCGGCGGGGGUGUGGGGUCCUUCCCCCGCCCCCCCCCCCACGCCUCCUCCCCUCCUCCCGCCCACGCCCCGCUCCCCGCCCCCGGAGCCCCGCGGACGCUACGCCGCGACGAGUAGGAGGGCCGCUGCGGUGAGCCUUGAAGCCUAGGGCGCGGGCCCGGGUGGAGCCGCCGCAGGUGCAGAUCUUGGUGGUAGUAGCAAAUAUUCAAACGAGAACUUUGAAGGCCGAAGUGGAGAAGGGUUCCAUGUGAACAGCAGUUGAACAUGGGUCAGUCGGUCCUGAGAGAUGGGCGAGCGCCGUUCCGAAGGGACGGGCGAUGGCCUCCGUUGCCCUCGGCCGAUCGAAAGGGAGUCGGGUUCAGAUCCCCGAAUCCGGAGUGGCGGAGAUGGGCGCCGCGAGGCGUCCAGUGCGGUAACGCGACCGAUCCCGGAGAAGCCGGCGGGAGCCCCGGGGAGAGUUCUCUUUUCUUUGUGAAGGGCAGGGCGCCCUGGAAUGGGUUCGCCCCGAGAGAGGGGCCCGUGCCUUGGAAAGCGUCGCGGUUCCGGCGGCGUCCGGUGAGCUCUCGCUGGCCCUUGAAAAUCCGGGGGAGAGGGUGUAAAUCUCGCGCCGGGCCGUACCCAUAUCCGCAGCAGGUCUCCAAGGUGAACAGCCUCUGGCAUGUUGGAACAAUGUAGGUAAGGGAAGUCGGCAAGCCGGAUCCGUAACUUCGGGAUAAGGAUUGGCUCUAAGGGCUGGGUCGGUCGGGCUGGGGCGCGAAGCGGGGCUGGGCGCGCGCCGCGGCUGGACGAGGCGCCGCCGCCCCCCCCACGCCCGGGGCACCCCCCUCGCGGCCCUCCCCCGCCCCACCCCGCGCGCGCCGCUCGCUCCCUCCCCGCCCCGCGCCCUCUCUCUCUCUCUCUCCCCCGCUCCCCGUCCUCCCCCCUCCCCGGGGGAGCGCCGCGUGGGGGCGGCGGCGGGGGGAGAAGGGUCGGGGCGGCAGGGGCCGGCGGCGGCCCGCCGCGGGGCCCCGGCGGCGGGGGCACGGUCCCCCGCGAGGGGGGCCCGGGCACCCGGGGGGCCGGCGGCGGCGGCGACUCUGGACGCGAGCCGGGCCCUUCCCGUGGAUCGCCCCAGCUGCGGCGGGCGUCGCGGCCGCCCCCGGGGAGCCCGGCGGGCGCCGGCGCGCCCCCCCCCCCACCCCACGUCUCGUCGCGCGCGCGUCCGCUGGGGGCGGGGAGCGGUCGGGCGGCGGCGGUCGGCGGGCGGCGGGGCGGGGCGGUUCGUCCCCCCGCCCUACCCCCCCGGCCCCGUCCGCCCCCCGUUCCCCCCUCCUCCUCGGCGCGCGGCGGCGGCGGCGGCAGGCGGCGGAGGGGCCGCGGGCCGGUCCCCCCCGCCGGGUCCGCCCCCGGGGCCGCGGUUCCGCGCGGCGCCUCGCCUCGGCCGGCGCCUAGCAGCCGACUUAGAACUGGUGCGGACCAGGGGAAUCCGACUGUUUAAUUAAAACAAAGCAUCGCGAAGGCCCGCGGCGGGUGUUGACGCGAUGUGAUUUCUGCCCAGUGCUCUGAAUGUCAAAGUGAAGAAAUUCAAUGAAGCGCGGGUAAACGGCGGGAGUAACUAUGACUCUCUUAAGGUAGCCAAAUGCCUCGUCAUCUAAUUAGUGACGCGCAUGAAUGGAUGAACGAGAUUCCCACUGUCCCUACCUACUAUCCAGCGAAACCACAGCCAAGGGAACGGGCUUGGCGGAAUCAGCGGGGAAAGAAGACCCUGUUGAGCUUGACUCUAGUCUGGCACGGUGAAGAGACAUGAGAGGUGUAGAAUAAGUGGGAGGCCCCCGGCGCCCCCCCGGUGUCCCCGCGAGGGGCCCGGGGCGGGGUCCGCCGGCCCUGCGGGCCGCCGGUGAAAUACCACUACUCUGAUCGUUUUUUCACUGACCCGGUGAGGCGGGGGGGCGAGCCCCGAGGGGCUCUCGCUUCUGGCGCCAAGCGCCCGGCCGCGCGCCGGCCGGGCGCGACCCGCUCCGGGGACAGUGCCAGGUGGGGAGUUUGACUGGGGCGGUACACCUGUCAAACGGUAACGCAGGUGUCCUAAGGCGAGCUCAGGGAGGACAGAAACCUCCCGUGGAGCAGAAGGGCAAAAGCUCGCUUGAUCUUGAUUUUCAGUACGAAUACAGACCGUGAAAGCGGGGCCUCACGAUCCUUCUGACCUUUUGGGUUUUAAGCAGGAGGUGUCAGAAAAGUUACCACAGGGAUAACUGGCUUGUGGCGGCCAAGCGUUCAUAGCGACGUCGCUUUUUGAUCCUUCGAUGUCGGCUCUUCCUAUCAUUGUGAAGCAGAAUUCACCAAGCGUUGGAUUGUUCACCCACUAAUAGGGAACGUGAGCUGGGUUUAGACCGUCGUGAGACAGGUUAGUUUUACCCUACUGAUGAUGUGUUGUUGCCAUGGUAAUCCUGCUCAGUACGAGAGGAACCGCAGGUUCAGACAUUUGGUGUAUGUGCUUGGCUGAGGAGCCAAUGGGGCGAAGCUACCAUCUGUGGGAUUAUGACUGAACGCCUCUAAGUCAGAAUCCCGCCCAGGCGGAACGAUACGGCAGCGCCGCGGAGCCUCGGUUGGCCUCGGAUAGCCGGUCCCCCGCCUGUCCCCGCCGGCGGGCCGCCCCCCCCCUCCACGCGCCCCGCGCGCGCGGGAGGGCGCGUGCCCCGCCGCGCGCCGGGACCGGGGUCCGGUGCGGAGUGCCCUUCGUCCUGGGAAACGGGGCGCGGCUGGAAAGGCGGCCGCCCCCUCGCCCGUCACGCACCGCACGUUCGUGGGGAACCUGGCGCUAAACCAUUCGUAGACGACCUGCUUCUGGGUCGGGGUUUCGUACGUAGCAGAGCAGCUCCCUCGCUGCGAUCUAUUGAAAGUCAGCCCUCGACACAAGGGUUUGUC
5VFW , Knot 18 25 0.77 28 24 23
AMVSEFLKQAWFIENEEQEYVQTVK
4RFE , Knot 100 233 0.78 38 138 219
HSEVQLVESGPGLVKPLETLSLTCAVPGGSIRRNYWSWIRQPPGKGLEWIGHSYGSGGSTNYNPSLESRVTLSVDTSKNLFSLKLTSVTAADTAVYYCARTVWYYTSGTHYFDHWGQGVLVTVSSASTKGPSVFPLAPSSRSTSESTAALGCLVKDYFPEPVTVSWNSGSLTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYVCNVNHKPSNTKVDKRVEIKTCGGGSK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8OHD_1)}(2) \setminus P_{f(5VFW_1)}(2)|=16\), \(|P_{f(5VFW_1)}(2) \setminus P_{f(8OHD_1)}(2)|=24\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:010110000111001110101101100010011100011101010011001101111111111111001100111100000001101101101110111011111111100011010011100000100001011011110101111010101101010111111000100000011010010001011111100011100000001100111100011000101110110101111001101101100000110101001110001110000000111100111001000111110101100011110111011011100001000111100111010011010111100110110011011101001011111111100111111110000111111111100011111110101111001001111110111011101111000101011000100011111100011000110110111000110010100110110001101110000000010000001000000001100000001000100000000000000100100000000000000000011111111011100001101110101111101110111011110011111011110011011111100100000011001101100110010010011101010000010010110110101001011001100001111011001111111000110111111110110001111110000100010001000100010000000000000001000001000000110000101000000000111111101010111001111011011011011011011011011011011011011011011110011110000000011101001011000000011011011010001001110000111100111111101111000100100101000000000000001101000100000101111110000001011111111000000001011111010100110100000001011111110011100100000000101101011001000000010000000000001010000010000110110111111110100101010111001111110111101110010000011010100001110111001010010011100011111111000000011110010101010100000011111111111011011110111010101111001101101101001100100010001100010000111101011100111111000110101010101110011111000101011111001001011010110111110111110011010100010011001111011110000111100000001100010011111010100100110001000010001001010011111110111101011101010101001111000111111011011100101000111011110111100111111110000110111110001011011000011010101110011001000110001110101111101111110011001110010001101100110000000011110000000011110110011010000010111000110101000001001010110000100011011110111011001111100001111001111011000011000100000111000011101110111111000110001001101011110011101011110101110100011011100100000110111011110011010010111101110011101001110011110100011010011010001001110000111111110100110011010111011011110110110010111110011110001001111110101011011000100010011100110011000011111011101101001111010011100010100011001001001101100111110111011111011011011011010101010101010101011010101001111110110110110110110110111110101111000000000010000000000010100000000000000000100010100001000000100000111100001011101001010010110111011111110010010110111000011110001111010111000111011110010010111010111000011011011011011101000111011111000011111001111011111111100001010111011011001110101110011001100001111110111011101001000011111110111011011000001001000001100110011111111100111000111000001110001111011011111011101001011110100011010110110101100110000111111100110111110000111111110000000000000101111110111101000011110111000100001111111111000101000011111101001011000011011010001101110000010011000001111100011111111111010111000010100111001010001010001011011100000111101110110000011010100111101101011101111111100110111001110001011000011110111110011000011111001110011001110011110101111011110011101010100101100111011110100100100000000101000111101000000001011000000000100001000010101010010001000000000010000101000000000000000000000001000000100000000000000011111110100101011111011011011111111111110011110110111110011011011000100101111000011011011111010110000001011111111000111010001111110011011011011011000011101011100111000000001011100100001100101101110100101100100000111111000110111010011010100000000000100001010000100101010101000100111110111111011001110110110110011011101101111011110110001000000010000100000001100001000100000010000000000000000110101011011011011011011101101111111001011100110000000010011100010000011110010110000101011010000100001100110100011011001100011110011010111001111111000110010001100111101111010010111110001011011101001101011010110000010001101000011101001111011111110001101111010111011101101111101100101100000001111011001110100001001000110011011010101011101110111011110000010010000010001001000110111100101100111111101110001101111001101111111111110000100111000110000110001101011011111110101111110101111011101111110000011010000000110100000101111110001111011110001001100001011100100110111101001001000011001000000010011000110111101111111011100001111110000010000011010011101000110010101001100111010110001000011111011010011101111110001100111101101010001001110110110101110100001111011100011111111011111000000101111011111110111110001000110000110000011010111010111001011111011110000101100000001100000011100001110111111010011111110010010111110110011000101101100111010001011011010010000001100000011010011000000001001001011110111100010011101001110010001000100110111111010111001110001110010010111101110011000010000100110110101001001001011011000010001101011111111001011100011101000110101010100011001111110011011110111100100100010111100101100111010000011100111100001000111011110110101101101001011110000110011000011101100110000001000100000100110111001000000000000101010000101010101111111010101000010010101001111001111000110101111010000001000011111101111010110011111110110010000000100010010101001010100010111111000110100111001000101110110001000001110011110000101010110111101100000001001011000100111110011000001101011111000100
Pair \(Z_2\) Length of longest common subsequence
8OHD_1,5VFW_1 40 1
8OHD_1,4RFE_1 144 4
5VFW_1,4RFE_1 146 2

Newick tree

 
[
	4RFE_1:82.91,
	[
		8OHD_1:20,5VFW_1:20
	]:62.91
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{5095 }{\log_{20} 5095}-\frac{25}{\log_{20}25})=1200\)
Status Protein1 Protein2 d d1/2
Query variables 8OHD_1 5VFW_1 747 381.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: