CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1QXP_1 8VTU_1 6ULB_1 Letter Amino acid
22 0 4 M Methionine
26 0 1 Y Tyrosine
57 636 12 A Alanine
45 0 5 N Asparagine
52 0 8 I Isoleucine
58 0 14 R Arginine
89 0 25 L Leucine
36 0 8 Q Glutamine
61 1065 17 G Glycine
55 0 6 F Phenylalanine
21 0 6 H Histidine
48 0 7 K Lycine
26 0 19 P Proline
57 0 15 S Serine
42 0 11 T Threonine
60 0 15 D Aspartic acid
12 795 2 C Cysteine
70 0 11 E Glutamic acid
19 0 5 W Tryptophan
44 0 14 V Valine

1QXP_1|Chains A, B|mu-like calpain|Rattus norvegicus (10116)
>8VTU_1|Chains A[auth 1A], EB[auth 2A]|23S Ribosomal RNA|Thermus thermophilus HB8 (300852)
>6ULB_1|Chain A|Sex hormone-binding globulin|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1QXP , Knot 334 900 0.84 40 311 813
MAGIAMKLAKDREAAEGLGSHERAIKYLNQDYETLRNECLEAGALFQDPAFPPVSHSLGFKELGPNSSKTYGIKWKRPTELLSNPQFIVDGATRTDICQGALGDSWLLAAIASLTLNETILHRVVPYGQSFQEGYAGIFHFQLWQFGEWVDVVVDDLLPTKDGKLVFVHSAQGNEFWSALLEKAYAKVNGSYEALSGGCTSEAFEDFTGGVTEWYDLQKAPSDLYQIILKALERGSLLGCSINISDIRDLEAITFKNLVRGHAYSVTDAKQVTYQGQRVNLIRMRNPWGEVEWKGPWSDNSYEWNKVDPYEREQLRVKMEDGEFWMSFRDFIREFTKLEICNLTPDALKSRTLRNWNTTFYEGTWRRGSTAGGCRNYPATFWVNPQFKIRLEEVDDADDYDSRESGCSFLLALMQKHRRRERRFGRDMETIGFAVYQVPRELAGQPVHLKRDFFLANASRAQSEHFINLREVSNRIRLPPGEYIVVPSTFEPNKEGDFLLRFFSEKKAGTQELDDQIQANLPDEKVLSEEEIDDNFKTLFSKLAGDDMEISVKELQTILNRIISKHKDLRTNGFSLESCRSMVNLMDRDGNGKLGLVEFNILWNRIRNYLTIFRKFDLDKSGSMSAYEMRMAIEAAGFKLPCQLHQVIVARFADDELIIDFDNFVRCLVRLEILFKIFKQLDPENTGTIQLDLISWLSFSVLGKLAAAIEHHHHHHMHYSNIEANESEEERQFRKLFVQLAGDDMEVSATELMNILNKVVTRHPDLKTDGFGIDTCRSMVAVMDSDTTGKLGFEEFKYLWNNIKKWQGIYKRFETDRSGTIGSNELPGAFEAAGFHLNQHIYSMIIRRYSDETGNMDFDNFISCLVRLDAMFRAFRSLDKNGTGQIQVNIQEWLQLTMYS
8VTU , Knot 480 2915 0.43 8 16 64
GUCAAGAUGGUAAGGGCCCACGGUGGAUGCCUCGGCACCCGAGCCGAUGAAGGACGUGGCUACCUGCGAUAAGCCAGGGGGAGCCGGUAGCGGGCGUGGAUCCCUGGAUGUCCGAAUGGGGGAACCCGGCCGGCGGGAACGCCGGUCACCGCGCUUUUGCGCGGGGGGAACCUGGGGAACUGAAACAUCUCAGUACCCAGAGGAGAGGAAAGAGAAAUCGACUCCCUGAGUAGCGGCGAGCGAAAGGGGACCAGCCUAAACCGUCCGGCUUGUCCGGGCGGGGUCGUGGGGCCCUCGGACACCGAAUCCCCAGCCUAGCCGAAGCUGUUGGGAAGCAGCGCCAGAGAGGGUGAAAGCCCCGUAGGCGAAAGGUGGGGGGAUAGGUGAGGGUACCCGAGUACCCCGUGGUUCGUGGAGCCAUGGGGGAAUCUGGGCGGACCACCGCCUAAGGCUAAGUACUCCGGGUGACCGAUAGCGCACCAGUACCGUGAGGGAAAGGUGAAAAGAACCCCGGGAGGGGAGUGAAAUAGAGCCUGAAACCGUGGGCUUACAAGCAGUCACGGCCCCGCAAGGGGUUGUGGCGUGCCUAUUGAAGCAUGAGCCGGCGACUCACGGUCGUGGGCGAGCUUAAGCCGUUGAGGCGGAGGCGUAGGGAAACCGAGUCCGAACAGGGCGCAAGCGGGCCGCACGCGGCCCGCAAAGUCCGCGGCCGUGGACCCGAAACCGGGCGAGCUAGCCCUGGCCAGGGUGAAGCUGGGGUGAGACCCAGUGGAGGCCCGAACCGGUGGGGGAUGCAAACCCCUCGGAUGAGCUGGGGCUAGGAGUGAAAAGCUAACCGAGCCCGGAGAUAGCUGGUUCUCCCCGAAAUGACUUUAGGGUCAGCCUCAGGCGCUGACUGGGGCCUGUAGAGCACUGAUAGGGCUAGGGGGCCCACCAGCCUACCAAACCCUGUCAAACUCCGAAGGGUCCCAGGUGGAGCCUGGGAGUGAGGGCGCGAGCGAUAACGUCCGCGUCCGAGCGCGGGAACAACCGAGACCGCCAGCUAAGGCCCCCAAGUCUGGGCUAAGUGGUAAAGGAUGUGGCGCCGCGAAGACAGCCAGGAGGUUGGCUUAGAAGCAGCCAUCCUUUAAAGAGUGCGUAAUAGCUCACUGGUCGAGUGGCGCCGCGCCGAAAAUGAUCGGGGCUUAAGCCCAGCGCCGAAGCUGCGGGUCUGGGGGAUGACCCCAGGCGGUAGGGGAGCGUUCCCGAUGCCGAUGAAGGCCGACCCGCGAGGGCGGCUGGAGGUAAGGGAAGUGCGAAUGCCGGCAUGAGUAACGAUAAAGAGGGUGAGAAUCCCUCUCGCCGUAAGCCCAAGGGUUCCUACGCAAUGGUCGUCAGCGUAGGGUUAGGCGGGACCUAAGGUGAAGCCGAAAGGCGUAGCCGAAGGGCAGCCGGUUAAUAUUCCGGCCCUUCCCGCAGGUGCGAUGGGGGGACGCUCUAGGCUAGGGGGACCGGAGCCAUGGACGAGCCCGGCCAGAAGCGCAGGGUGGGAGGUAGGCAAAUCCGCCUCCCAACAAGCUCUGCGUGGUGGGGAAGCCCGUACGGGUGACAACCCCCCGAAGCCAGGGAGCCAAGAAAAGCCUCUAAGCACAACCUGCGGGAACCCGUACCGCAAACCGACACAGGUGGGCGGGUGCAAGAGCACUCAGGCGCGCGGGAGAACCCUCGCCAAGGAACUCUGCAAGUUGGCCCCGUAACUUCGGGAGAAGGGGUGCUCCCUGGGGUGAUGAGCCCCGGGGAGCCGCAGUGAACAGGCUCUGGCGACUGUUUACCAAAAACACAGCUCUCUGCGAACUCGUAAGAGGAGGUAUAGGGAGCGACGCUUGCCCGGUGCCGGAAGGUCAAGGGGAGGGGUGCAAGCCCCGAACCGAAGCCCCGGUGAACGGCGGCCGUAACUAUAACGGUCCUAAGGUAGCGAAAUUCCUUGUCGGGUAAGUUCCGACCUGCACGAAAAGCGUAACGACCGGAGCGCUGUCUCGGCGAGGGACCCGGUGAAAUUGAACUGGCCGUGAAGAUGCGGCCUACCCGUGGCAGGACGAAAAGACCCCGUGGAGCUUUACUGCAGCCUGGUGUUGGCUCUUGGUCGCGCCUGCGUAGGAUAGGUGGGAGCCUGUGAACCCCCGCCUCCGGGUGGGGGGGAGGCGCCGGUGAAAUACCACCCUGGCGCGGCUGGGGGCCUAACCCUCGGAUGGGGGGACAGCGCUUGGCGGGCAGUUUGACUGGGGCGGUCGCCUCCUAAAAGGUAACGGAGGCGCCCAAAGGUCCCCUCAGGCGGGACGGAAAUCCGCCGGAGAGCGCAAGGGUAGAAGGGGGCCUGACUGCGAGGCCUGCAAGCCGAGCAGGGGCGAAAGCCGGGCCUAGUGAACCGGUGGUCCCGUGUGGAAGGGCCAUCGAUCAACGGAUAAAAGUUACCCCGGGGAUAACAGGCUGAUCUCCCCCGAGCGUCCACAGCGGCGGGGAGGUUUGGCACCUCGAUGUCGGCUCGUCGCAUCCUGGGGCUGAAGAAGGUCCCAAGGGUUGGGCUGUUCGCCCAUUAAAGCGGCACGCGAGCUGGGUUCAGAACGUCGUGAGACAGUUCGGUCUCUAUCCGCCACGGGCGCAGGAGGCUUGAGGGGGGCUCUUCCUAGUACGAGAGGACCGGAAGGGACGCACCUCUGGUUUCCCAGCUGUCCCUCCAGGGGCAUAAGCUGGGUAGCCAUGUGCGGAAGGGAUAACCGCUGAAAGCAUCUAAGCGGGAAGCCCGCCCCAAGAUGAGGCCUCCCACGGCGUCAAGCCGGUAAGGACCCGGGAAGACCACCCGGUGGAUGGGCCGGGGGUGUAAGCGCCGCGAGGCGUUGAGCCGACCGGUCCCAAUCGUCCGAGGUCUUGACCCCUCC
6ULB , Knot 93 205 0.80 40 140 201
LRPVLPTQSAHDPPAVHLSNGPGQEPIAVMTFDLTKITKTSSSFEVRTWDPEGVIFYGDTNPKDDWFMLGLRDGRPEIQLHNHWAQLTVGAGPRLDDGRWHQVEVKMEGDSVLLEVDGEEVLRLRQVSGPLTSKRHPIMRIALGGLLFPASNLRLPLVPALDGCLRRDSWLDKQAKISASAPTSLRSCDVESNPGIFLPPGTQAE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1QXP_1)}(2) \setminus P_{f(8VTU_1)}(2)|=306\), \(|P_{f(8VTU_1)}(2) \setminus P_{f(1QXP_1)}(2)|=11\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111111011000011011100001100100000010000101111100111111000111001110000001101001001100101110110000100111100111111101010001100111010010010111101011011011011100111000101111001010011011100101010100011011000011001011100100100110010011101100101110010100100101101001101010010010010001001011010011101010111000000100101000001010100101110100110010010100101011000010010001001010010011100001101110101010100100100000000100111111000000000110010011111001100111011010001111010010000110100100010111100111100101000101110110000110001000101011000110000100010011001110010101001001100110000010001101000001101100010101111010111001000101100101000101010010111011110110010011110110001110100110011010111011001010001010101101101011101111100000001000010100000000100111011100101010011011001100010100011110000011111000001011100100110010010110001000001011000111110111101000100111000000010101001100110101110110010001010101010011010100
Pair \(Z_2\) Length of longest common subsequence
1QXP_1,8VTU_1 317 4
1QXP_1,6ULB_1 213 4
8VTU_1,6ULB_1 148 3

Newick tree

 
[
	1QXP_1:14.94,
	[
		6ULB_1:74,8VTU_1:74
	]:75.94
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{3815 }{\log_{20} 3815}-\frac{900}{\log_{20}900})=672.\)
Status Protein1 Protein2 d d1/2
Query variables 1QXP_1 8VTU_1 474 402
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]