CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9DGK_1 8PCU_1 7BSR_1 Letter Amino acid
10 22 38 G Glycine
48 27 36 L Leucine
18 13 13 P Proline
3 3 6 W Tryptophan
18 5 3 Y Tyrosine
16 13 22 D Aspartic acid
21 11 5 N Asparagine
8 1 4 C Cysteine
15 18 6 Q Glutamine
28 11 12 I Isoleucine
17 29 17 T Threonine
20 15 33 R Arginine
11 2 6 H Histidine
13 5 8 M Methionine
18 4 11 F Phenylalanine
19 15 41 V Valine
29 11 24 E Glutamic acid
29 10 4 K Lycine
31 13 15 S Serine
18 35 56 A Alanine

9DGK_1|Chains A, B|Retinoblastoma-associated protein|Homo sapiens (9606)
>8PCU_1|Chain A|Beta-lactamase|Klebsiella pneumoniae (573)
>7BSR_1|Chain A|4-hydroxymandelate oxidase|Amycolatopsis orientalis (31958)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9DGK , Knot 168 390 0.85 40 226 376
GEFNTIQQLMMILNSASDQPSENLISYFNNCTVNPKESILKRVKDIGYIFKEKFAKAVGQGCVEIGSQRYKLGVRLYYRVMESMLKSEEERLSIQNFSKLLNDNIFHMSLLACALEVVMATYSRSTSQNLDSGTDLSFPWILNVLNLKAFDFYKVIESFIKAEGNLTREMIKHLERCEHRIMESLAWLSDSPLFDLIKQSKDREGPTDHLESACPLNLPLQNNHTAADMYLEPVRAPKKKSTSLSLFYKKVYRLAYLRLNTLCERLLSEHPELEHIIWTLFQHTLQNEYELMRDRHLDQIVMCSMYGICKVKNIDLKFKIIVTAYKDLPHAVQETFKRVLIKEEEYDSIIVFYNSVFMQRLKTNILQYASTRPPTLAPIPHIPRSPYKFP
8PCU , Knot 117 263 0.82 40 163 250
QTSAVQQKLAALEKSSGGRLGVALIDTADNTQVLYRGDERFPMCSTSKVMAAAAVLKQSETQKQLLNQPVEIKPADLVNYNPIAEKHVNGTMTLAELSAAALQYSDNTAMNKLIAQLGGPGGVTAFARAIGDETFRLDRTEPTLNTAIPGDPRDTTTPRAMAQTLRQLTLGHALGETQRAQLVTWLKGNTTGAASIRAGLPTSWTVGDKTGSGDYGTTNDIAVIWPQGRAPLVLVTYFTQPQQNAESRRDVLASAARIIAEGL
7BSR , Knot 143 360 0.78 40 167 316
GSHMTYVSLADLERAARDVLPGEIFDFLAGGSGTEASLVANRTALERVFVIPRMLRDLTDVTTEIDIFGRRAALPMAVAPVAYQRLFHPEGELAVARAARDAGVPYTICTLSSVSLEEIAAVGGRPWFQLFWLRDEKRSLDLVRRAEDAGCEAIVFTVDVPWMGRRLRDMRNGFALPEWVTAANFDAGTAAHRRTQGVSAVADHTAREFAPATWESVEAVRAHTDLPVVLKGILAVEDARRAVDAGAGGIVVSNHGGRQLDGAVPGIEMLGEIVAAVSGGCEVLVDGGIRSGGDVLKATALGASAVLVGRPVMWALAAAGQDGVRQLLELLAEEVRDAMGLAGCESVGAARRLNTKLGVV

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9DGK_1)}(2) \setminus P_{f(8PCU_1)}(2)|=118\), \(|P_{f(8PCU_1)}(2) \setminus P_{f(9DGK_1)}(2)|=55\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101001001111100100010001100100001010001100100110110001101110101011000001110100011001100000010100100110001101011101101111000000000100100101111101101011010011001101010100011001000000110011110001110110000000110001001011011100000110101011011000000101100010011010100100011000101001110110001000001100001001110010110010010101011101000110110001001110000000111100011100100011001000110111110110010011
Pair \(Z_2\) Length of longest common subsequence
9DGK_1,8PCU_1 173 3
9DGK_1,7BSR_1 185 4
8PCU_1,7BSR_1 156 4

Newick tree

 
[
	9DGK_1:93.08,
	[
		8PCU_1:78,7BSR_1:78
	]:15.08
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{653 }{\log_{20} 653}-\frac{263}{\log_{20}263})=109.\)
Status Protein1 Protein2 d d1/2
Query variables 9DGK_1 8PCU_1 139 116.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]