CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9HRH_1 5ZAP_1 4ZMP_1 Letter Amino acid
15 50 6 Q Glutamine
30 98 16 G Glycine
32 116 19 V Valine
51 196 12 A Alanine
12 26 0 C Cysteine
26 59 7 F Phenylalanine
9 9 2 W Tryptophan
15 31 4 M Methionine
28 45 16 S Serine
17 71 19 T Threonine
13 43 6 Y Tyrosine
22 77 19 D Aspartic acid
33 64 15 E Glutamic acid
26 30 14 I Isoleucine
24 18 11 K Lycine
22 95 14 P Proline
28 96 6 R Arginine
14 56 8 N Asparagine
16 48 5 H Histidine
47 146 14 L Leucine

9HRH_1|Chain A|Aromatic-L-amino-acid decarboxylase|Homo sapiens (9606)
>5ZAP_1|Chains A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P|Major capsid protein|Human herpesvirus 2 (10310)
>4ZMP_1|Chain A|Cadherin-3|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9HRH , Knot 195 480 0.83 40 254 454
MNASEFRRRGKEMVDYVANYMEGIEGRQVYPDVEPGYLRPLIPAAAPQEPDTFEDIINDVEKIIMPGVTHWHSPYFFAYFPTASSYPAMLADMLCGAIGCIGFSWAASPACTELETVMMDWLGKMLELPKAFLNEKAGEGGGVIQGSASEATLVALLAARTKVIHRLQAASPELTQAAIMEKLVAYSSDQAHSSVERAGLIGGVKLKAIPSDGNFAMRASALQEALERDKAAGLIPFFMVATLGTTTCCSFDNLLEVGPICNKEDIWLHVDAAYAGSAFICPEFRHLLNGVEFADSFNFNPHKWLLVNFDCSAMWVKKRTDLTGAFRLDPTYLKHSHQDSGLITDYQHWQIPLGRRFRSLKMWFVFRMYGVKGLQAYIRKHVQLSHEFESLVRQDPRFEICVEVILGLVCFRLKGSNKVNEALLQRINSAKKIHLVPCHLRDKFVLRFAICSRTVESAHVQRAWEHIKELAADVLRAERE
5ZAP , Knot 475 1374 0.83 40 334 1110
MAAPARDPPGYRYAAAMVPTGSILSTIEVASHRRLFDFFARVRSDENSLYDVEFDALLGSYCNTLSLVRFLELGLSVACVCTKFPELAYMNEGRVQFEVHQPLIARDGPHPVEQPVHNYMTKVIDRRALNAAFSLATEAIALLTGEALDGTGISLHRQLRAIQQLARNVQAVLGAFERGTADQMLHVLLEKAPPLALLLPMQRYLDNGRLATRVARATLVAELKRSFCDTSFFLGKAGHRREAIEAWLVDLTTATQPSVAVPRLTHADTRGRPVDGVLVTTAAIKQRLLQSFLKVEDTEADVPVTYGEMVLNGANLVTALVMGKAVRSLDDVGRHLLDMQEEQLEANRETLDELESAPQTTRVRADLVAIGDRLVFLEALEKRIYAATNVPYPLVGAMDLTFVLPLGLFNPAMERFAAHAGDLVPAPGHPEPRAFPPRQLFFWGKDHQVLRLSMENAVGTVCHPSLMNIDAAVGGVNHDPVEAANPYGAYVAAPAGPGADMQQRFLNAWRQRLAHGRVRWVAECQMTAEQFMQPDNANLALELHPAFDFFAGVADVELPGGEVPPAGPGAIQATWRVVNGNLPLALCPVAFRDARGLELGVGRHAMAPATIAAVRGAFEDRSYPAVFYLLQAAIHGSEHVFCALARLVTQCITSYWNNTRCAAFVNDYSLVSYIVTYLGGDLPEECMAVYRDLVAHVEALAQLVDDFTLPGPELGGQAQAELNHLMRDPALLPPLVWDCDGLMRHAALDRHRDCRIDAGGHEPVYAAACNVATADFNRNDGRLLHNTQARAADAADDRPHRPADWTVHHKIYYYVLVPAFSRGRCCTAGVRFDRVYATLQNMVVPEIAPGEECPSDPVTDPAHPLHPANLVANTVNAMFHNGRVVVDGPAMLTLQVLAHNMAERTTALLCSAAPDAGANTASTANMRIFDGALHAGVLLMAPQHLDHTIQNGEYFYVLPVHALFAGADHVANAPNFPPALRDLARHVPLVPPALGANYFSSIRQPVVQHARESAAGENALTYALMAGYFKMSPVALYHQLKTGLHPGFGFTVVRQDRFVTENVLFSERASEAYFLGQLQVARHETGGGVNFTLTQPRGNVDLGVGYTAVAATATVRNPVTDMGNLPQNFYLGRGAPPLLDNAAAVYLRNAVVAGNRLGPAQPLPVFGCAQVPRRAGMDHGQDAVCEFIATPVATDINYFRRPCNPRGRAAGGVYAGDKEGDVIALMYDHGQSDPARPFAATANPWASQRFSYGDLLYNGAYHLNGASPVLSPCFKFFTAADITAKHRCLERLIVETGSAVSTATAASDVQFKRPPGCRELVEDPCGLFQEAYPITCASDPALLRSARDGEAHARETHFTQYLIYDASPLKGLSL
4ZMP , Knot 99 213 0.83 38 145 202
DWVVAPISVPENGKGPFPQRLNQLKSNKDRDTKIFYSITGPGADSPPEGVFAVEKETGWLLLNKPLDREEIAKYELFGHAVSENGASVEDPMNISIIVTDLNDHKPKFTQDTFRGSVLEGVLPGTSVMQVTATDEDDAIYTYNGVVAYSIHSQEPKDPHDLMFTIHRSTGTISVISSGLDREKVPEYTLTIQATDMDGDGSTTTAVAVVEILD

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9HRH_1)}(2) \setminus P_{f(5ZAP_1)}(2)|=30\), \(|P_{f(5ZAP_1)}(2) \setminus P_{f(9HRH_1)}(2)|=110\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101001000100110011001011010010101011010111111110010010011001001111110010010111011010001111101101111011101110110001001110111011011011100011011111010100101111111000110010110101001111001110000010001001111111010111001011101011001100001111111111101100000010011011110000011101011011011101010011011011001010100111101000111100000101110101001000000011100000101111001001011111010110110101000101000100110001010101011111101010100010011100100100101110010001110111000010010100110010011101101000
Pair \(Z_2\) Length of longest common subsequence
9HRH_1,5ZAP_1 140 5
9HRH_1,4ZMP_1 195 4
5ZAP_1,4ZMP_1 215 5

Newick tree

 
[
	4ZMP_1:11.39,
	[
		9HRH_1:70,5ZAP_1:70
	]:41.39
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1854 }{\log_{20} 1854}-\frac{480}{\log_{20}480})=343.\)
Status Protein1 Protein2 d d1/2
Query variables 9HRH_1 5ZAP_1 429 289.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]