CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3SXR_1 7XBA_1 4NBJ_1 Letter Amino acid
10 7 20 I Isoleucine
13 7 7 F Phenylalanine
23 17 3 S Serine
16 12 6 Y Tyrosine
21 10 11 E Glutamic acid
4 8 20 N Asparagine
13 13 13 D Aspartic acid
21 15 8 V Valine
10 10 5 R Arginine
17 25 9 G Glycine
10 9 3 H Histidine
11 6 2 M Methionine
5 2 3 W Tryptophan
8 16 4 A Alanine
11 15 3 Q Glutamine
28 33 15 L Leucine
22 12 20 K Lycine
10 12 3 P Proline
8 10 7 T Threonine
7 4 2 C Cysteine

3SXR_1|Chains A, B|Cytoplasmic tyrosine-protein kinase BMX|Homo sapiens (9606)
>7XBA_1|Chains A, B|Glutathione S-transferase P|Homo sapiens (9606)
>4NBJ_1|Chains A, B, C, D, E, F, G, H|D-tyrosyl-tRNA(Tyr) deacylase|Plasmodium falciparum (36329)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3SXR , Knot 121 268 0.84 40 186 260
GHMELKREEITLLKELGSGQFGVVKLGKWKGQYDVAVKMIKEGSMSEDEFFQEAQTMMKLSHPKLVKFYGVCSKEYPIYIVTEYISNGCLLNYLRSHGKGLEPSQLLEMCYDVCEGMAFLESHQFIHRDLAARNCLVDRDLCVKVSDFGMTRYVLDDQYVSSVGTKFPVKWSAPEVFHYFKYSSKSDVWAFGILMWEVFSLGKMPYDLYTNSEVVLKVSQGHRLYRPHLASDTIYQIMYSCWHELPEKRPTFQQLLSSIEPLREKDKH
7XBA , Knot 111 243 0.83 40 151 230
MGSSHHHHHHSSGLVPRGSHMASMTGGQQMGRGSPPYTVVYFPVRGRCAALRMLLADQGQSWKEEVVTVETWQEGSLKASCLYGQLPKFQDGDLTLYQSNTILRHLGRTLGLYGKDQQEAALVDMVNDGVEDLRCKYISLIYTNYEAGKDDYVKALPGQLKPFETLLSQNQGGKTFIVGDQISFADYNLLDLLLIHEVLAPGCLDAFPLLSAYVGRLSARPKLKAFLASPEYVNLPINGNGKQ
4NBJ , Knot 79 164 0.82 40 119 161
MRVVIQRVKGAILSVRKENIGENEKELEIISEIKNGLICFLGIHKNDTWEDALYIIRKCLNLRLWNNDNKTWDKNVKDLNYELLIVSQFTLFGNTKKGNKPDFHLAKEPNEALIFYNKIIDEFKKQYNDDKIKIGKFGNYMNIDVTNDGPVTIYIDTHDINLNK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3SXR_1)}(2) \setminus P_{f(7XBA_1)}(2)|=95\), \(|P_{f(7XBA_1)}(2) \setminus P_{f(3SXR_1)}(2)|=60\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1010100001011001101011110110101000111011001010000110010011010010110101100000110110001001011001000101101001101000100111110000110001110001100010101001110001100001001100111010110110010000000111111111011011011001000001110100100100101100010011000100110001010011001011000000
Pair \(Z_2\) Length of longest common subsequence
3SXR_1,7XBA_1 155 3
3SXR_1,4NBJ_1 183 3
7XBA_1,4NBJ_1 168 3

Newick tree

 
[
	4NBJ_1:91.01,
	[
		3SXR_1:77.5,7XBA_1:77.5
	]:13.51
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{511 }{\log_{20} 511}-\frac{243}{\log_{20}243})=76.8\)
Status Protein1 Protein2 d d1/2
Query variables 3SXR_1 7XBA_1 96 89
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]