CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6RWI_1 4FDL_1 9IRP_1 Letter Amino acid
10 9 6 Y Tyrosine
8 14 9 N Asparagine
9 11 11 Q Glutamine
3 7 9 H Histidine
10 17 22 I Isoleucine
24 21 17 L Leucine
5 17 5 F Phenylalanine
7 12 6 P Proline
27 18 18 A Alanine
33 20 14 E Glutamic acid
20 21 11 S Serine
12 18 8 V Valine
17 18 14 G Glycine
2 2 0 W Tryptophan
13 15 9 R Arginine
15 27 12 D Aspartic acid
2 11 1 C Cysteine
17 25 11 K Lycine
9 7 6 M Methionine
10 15 12 T Threonine

6RWI_1|Chain A|14-3-3 protein sigma|Homo sapiens (9606)
>4FDL_1|Chains A, B|Caspase-7|Homo sapiens (9606)
>9IRP_1|Chains A, B, C, D, E, F, G, H, I, J, K, L, M, N|ATP-dependent Clp protease proteolytic subunit|Staphylococcus aureus (1280)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6RWI , Knot 114 253 0.83 40 170 244
GAMGSMERASLIQKAKLAEQAERYEDMAAFMKGAVEKGEELSCEERNLLSVAYKNVVGGQRAAWRVLSSIEQKSNEEGSEEKGPEVREYREKVETELQGVCDTVLGLLDSHLIKEAGDAESRVFYLKMKGDYYRYLAEVATGDDKKRIIDSARSAYQEAMDISKKEMPPTNPIRLGLALNFSVFHYEIANSPEEAISLAKTTFDEAMADLHTLSEDSYKDSTLIMQLLRDNLTLWTADNAGEEGGEAPQEPQS
4FDL , Knot 136 305 0.85 40 193 296
MLEADDQGCIEEQGVEDSANEDSVDAKPDRSSFVPSLFSKKKKNVTMRSIKTTRDRVPTYQYNMNFEKLGKCIIINNKNFDKVTGMGVRNGTDKDAEALFKCFRSLGFDVIVYNDCSCAKMQDLLKKASEEDHTNAACFACILLSHGEENVIYGKDGVTPIKDLTAHFRGDRCKTLLEKPKLFFIQACRGTELDDGIQADSGPINDTDANPRYKIPVEADFLFAYSTVPGYYSWRSPGRGSWFVQALCSILEEHGKDLEIMQILTRVNDRVARHFESQSDDPHFHEKKQIPCVVSMLTKELYFSQ
9IRP , Knot 94 201 0.82 38 145 194
MNLIPTVIETTNRGERAYDIYSRLLKDRIIMLGSQIDDNVANSIVSQLLFLQAQDSEKDIYLYINSPGGSVTAGFAIYDTIQHIKPDVQTICIGMAASMGSFLLAAGAKGKRFALPNAEVMIHQPLGGAQGQATEIEIAANHILKTREKLNRILSERTGQSIEKIQKDTDRDNFLTAEEAKEYGLIDEVMVPETKHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6RWI_1)}(2) \setminus P_{f(4FDL_1)}(2)|=74\), \(|P_{f(4FDL_1)}(2) \setminus P_{f(6RWI_1)}(2)|=97\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1111010010110010110010000011111011100100100000011011000111100111011001000000010000110100000010001011000111110001100110100011010101000001101101000001100100100011010000111001101111101011000110010011011000100111010010000000001110110001011010011001101100100
Pair \(Z_2\) Length of longest common subsequence
6RWI_1,4FDL_1 171 3
6RWI_1,9IRP_1 145 3
4FDL_1,9IRP_1 164 3

Newick tree

 
[
	4FDL_1:87.20,
	[
		6RWI_1:72.5,9IRP_1:72.5
	]:14.70
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{558 }{\log_{20} 558}-\frac{253}{\log_{20}253})=86.5\)
Status Protein1 Protein2 d d1/2
Query variables 6RWI_1 4FDL_1 112 101.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]