CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7GQS_1 7MBU_1 2UXD_1 Letter Amino acid
16 37 0 N Asparagine
14 28 0 H Histidine
30 68 0 S Serine
21 50 0 Q Glutamine
27 69 0 E Glutamic acid
43 137 0 L Leucine
15 34 0 M Methionine
20 63 0 T Threonine
22 87 312 A Alanine
28 64 0 R Arginine
15 20 428 C Cysteine
15 31 0 Y Tyrosine
31 68 0 I Isoleucine
26 67 0 K Lycine
16 58 0 F Phenylalanine
5 26 0 W Tryptophan
26 77 0 V Valine
23 67 0 D Aspartic acid
29 61 545 G Glycine
17 53 0 P Proline

7GQS_1|Chain A|Bifunctional 3'-5' exonuclease/ATP-dependent helicase WRN|Homo sapiens (9606)
>7MBU_1|Chains A, B, C, D|Transient receptor potential melastatin 5|Danio rerio (7955)
>2UXD_1|Chain A|16S RIBOSOMAL RNA|THERMUS THERMOPHILUS (300852)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7GQS , Knot 187 439 0.86 40 247 424
MNEGEEDDDKDFLWPAPNEEQVTCLKMYFGHSSFKPVQWKVIHSVLEERRDNVAVMATGYGKSLCFQYPPVYVGKIGLVISPLISLMEDQVLQLKMSNIPACFLGSAQSENVLTDIKLGKYRIVYVTPEYCSGNMGLLQQLEADIGITLIAVDEAHCISEWGHDFRDSFRKLGSLKTALPMVPIVALTATASSSIREDIVRCLNLRNPQITCTGFDRPNLYLEVRRKTGNILQDLQPFLVKTSSHWEFEGPTIIYCPSRKMTQQVTGELRKLNLSCGTYHAGMSFSTRKDIHHRFVRDEIQCVIATIAFGMGINKADIRQVIHYGAPKDMESYYQEIGRAGRDGLQSSCHVLWAPADINLNRHLLTEIRNEKFRLYKLKMMAKMEKYLHSSRCRRQIILSHFEDKQVQKASLGIMGTEKCCDNCRSRLDHGGRLEVLFQ
7MBU , Knot 425 1165 0.85 40 343 1038
MVEKSSERFDKQMAGRLGDIDFTGVSRTRGKFVRVTSSTDPAEIYQILTKQWGLAPPHLVVALMGGDEVAQLKPWLRDTLRKGLVKAAQSTGAWILTSGLRFGITKNLGQAVRDHSLASTSPKVRVVAIGIAPWNMIQNRDLLLSAKPDHPATYPTEDLPYGAVYSLDCNHSHFILVDEDPKRPGATGEMRVKMLKHISLQRTGYGGTGSIEIPVLCLLVHGEPRILQKMYKNIQNSIPWLILAGSGGVADILVTLMDRGCWDADIVQELLINTFPDGLHSTEITSWTKLIQRILDHGHLLTVHDPEQDSELDTVILKALVKACKSQSQEAQDFLDALKLAVAWNRVDIAKSEIFSGDVQWSAQDLEEVMMEALVNDKPDFVRLFVDNGVNIKQFLTYGRLQELYCSVSEKNLLHTLLLKKNQERQAQLARKRMSGNPNNELGDRKFRFTFHEVSKVLKDFLDDTCKGFYQKLPAERMGKGRLFHSQKNLPDMDRRCEHPWRDLFLWAILQNRQEMANYFWAMGPEAVAAALVGCKIMKEMAHLATEAESARSMKNAKYEQFAMDLFSECYSNSEDRAYSLLVRKTCCWSKATVLNIATLAEAKCFFAHDGVQALLTKVWWGAMRTDTSISRLVLTFFIPPLVWTSLIKFNPEEQVSKDEGEPFAELDSLETEQALLLTDGDPVAGEGSAETAARSCSATFIRVVLRRWNRFWSAPVTVFMGNVIMYFAFLILFSYVLLLDFRPPPPYGPSAAEIILYFWVFTLVLEEIRQSFFTDEDMSILKKMKLYVEDNWNKCDMVAISLFVVGLSCRMAMSTYEAGRTVLALDFMVFTLRLIHIFAIHKQLGPKIIIVERMIKDVFFFLFFLSVWLIAYGVTTQALLHPNDPRIDWVFRRALYRPYLHIFGQIPLEEIDAAKMPDDNCTTDVQEIILGTLPPCPNIYANWLVILLLVIYLLVTNVLLLNLLIAMFSYTFQVVQENADIFWKFQRYNLIVEYHSRPALAPPFIIISHITQALLSFIKKTENTQDLLERELPSGLDQKLMTWETVQKENYLAKLEHEHRESSGERLRYTSSKVQTLLRMVGGFKDQEKRMATVETEVRYCGEVLSWIAECFHKSTLKCDRDAPKAPRSIAGSSRDQQPQGAKRQQPGGHPAYGTDKKLPFIDH
2UXD , Knot 274 1523 0.44 8 16 63
UUUGUUGGAGAGUUUGAUCCUGGCUCAGGGUGAACGCUGGCGGCGUGCCUAAGACAUGCAAGUCGUGCGGGCCGCGGGGUUUUACUCCGUGGUCAGCGGCGGACGGGUGAGUAACGCGUGGGUGACCUACCCGGAAGAGGGGGACAACCCGGGGAAACUCGGGCUAAUCCCCCAUGUGGACCCGCCCCUUGGGGUGUGUCCAAAGGGCUUUGCCCGCUUCCGGAUGGGCCCGCGUCCCAUCAGCUAGUUGGUGGGGUAAUGGCCCACCAAGGCGACGACGGGUAGCCGGUCUGAGAGGAUGGCCGGCCACAGGGGCACUGAGACACGGGCCCCACUCCUACGGGAGGCAGCAGUUAGGAAUCUUCCGCAAUGGGCGCAAGCCUGACGGAGCGACGCCGCUUGGAGGAAGAAGCCCUUCGGGGUGUAAACUCCUGAACCCGGGACGAAACCCCCGACGAGGGGACUGACGGUACCGGGGUAAUAGCGCCGGCCAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGGCGCGAGCGUUACCCGGAUUCACUGGGCGUAAAGGGCGUGUAGGCGGCCUGGGGCGUCCCAUGUGAAAGACCACGGCUCAACCGUGGGGGAGCGUGGGAUACGCUCAGGCUAGACGGUGGGAGAGGGUGGUGGAAUUCCCGGAGUAGCGGUGAAAUGCGCAGAUACCGGGAGGAACGCCGAUGGCGAAGGCAGCCACCUGGUCCACCCGUGACGCUGAGGCGCGAAAGCGUGGGGAGCAAACCGGAUUAGAUACCCGGGUAGUCCACGCCCUAAACGAUGCGCGCUAGGUCUCUGGGUCUCCUGGGGGCCGAAGCUAACGCGUUAAGCGCGCCGCCUGGGGAGUACGGCCGCAAGGCUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGCAUGUGGUUUAAUUCGAAGCAACGCGAAGAACCUUACCAGGCCUUGACAUGCUAGGGAAACCCGGGUGAAAGCCUGGGGUGCCCCGCGAGGGGAGCCCUAGCACAGGUGCUGCAUGGCCGUCGUCAGCUCGUGCCGUGAGGUGUUGGGUUAAGUCCCGCAACGAGCGCAACCCCCGCCGUUAGUUGCCAGCGGUUCGGCCGGGCACUCUAACGGGACUGCCCGCGAAAGCGGGAGGAAGGAGGGGACGACGUCUGGUCAGCAUGGCCCUUACGGCCUGGGCGACACACGUGCUACAAUGCCCACUACAAAGCGAUGCCACCCGGCAACGGGGAGCUAAUCGCAAAAAGGUGGGCCCAGUUCGGAUUGGGGUCUGCAACCCGACCCCAUGAAGCCGGAAUCGCUAGUAAUCGCGGAUCAGCCAUGCCGCGGUGAAUACGUUCCCGGGCCUUGUACACACCGCCCGUCACGCCAUGGGAGCGGGCUCUACCCGAAGUCGCCGGGAGCCUACGGGCAGGCGCCGAGGGUAGGGCCCGUGACUGGGGCGAAGUCGUAACAAGGUAGCUGUACCGGAAGGUGCGGCUGGAUCACCUCCUUUCU

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7GQS_1)}(2) \setminus P_{f(7MBU_1)}(2)|=25\), \(|P_{f(7MBU_1)}(2) \setminus P_{f(7GQS_1)}(2)|=121\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1001000000011111100001001010110001011010110011000000111110101001010011101101111101110110001101010011101110100001100101100011010100001011110010101110111100100100110010001001101001111111111010100010001100101001010001100101010100001011001011110000010101101100100010001010100101001000111010000010001100010011101111111001010011001110010000001101100110000011111101010001100100001010010111010001000000001110010000100101111100000000000100110101110
Pair \(Z_2\) Length of longest common subsequence
7GQS_1,7MBU_1 146 4
7GQS_1,2UXD_1 251 2
7MBU_1,2UXD_1 343 3

Newick tree

 
[
	2UXD_1:16.32,
	[
		7GQS_1:73,7MBU_1:73
	]:95.32
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1604 }{\log_{20} 1604}-\frac{439}{\log_{20}439})=295.\)
Status Protein1 Protein2 d d1/2
Query variables 7GQS_1 7MBU_1 377 258
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]