CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4PMM_1 5XXP_1 3EAY_1 Letter Amino acid
4 2 19 N Asparagine
15 9 9 H Histidine
32 11 33 L Leucine
6 0 6 W Tryptophan
22 4 23 V Valine
23 18 8 A Alanine
13 3 18 D Aspartic acid
16 5 18 Q Glutamine
12 4 18 I Isoleucine
12 4 14 F Phenylalanine
10 2 18 T Threonine
7 0 7 C Cysteine
24 7 7 G Glycine
9 3 4 M Methionine
13 2 16 P Proline
11 4 27 S Serine
23 11 16 R Arginine
19 8 28 E Glutamic acid
11 2 24 K Lycine
9 2 10 Y Tyrosine

4PMM_1|Chain A|High affinity nerve growth factor receptor|Homo sapiens (9606)
>5XXP_1|Chains A, B|LysR-type regulatory protein|Cupriavidus necator (106590)
>3EAY_1|Chain A|Sentrin-specific protease 7|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4PMM , Knot 127 291 0.82 40 191 281
CVHHIKRRDIVLKWELGEGAFGKVFLAECHNLLPEQDKMLVAVKALKEASESARQDFQREAELLTMLQHQHIVRFFGVCTEGRPLLMVFEYMRHGDLNRFLRSHGPDAKLLAGGEDVAPGPLGLGQLLAVASQVAAGMVYLAGLHFVHRDLATRNCLVGQGLVVKIGDFGMSRDIYSTDYYRVGGRTMLPIRWMPPESILYRKFTTESDVWSFGVVLWEIFTYGKQPWYQLSNTEAIDCITQGRELERPRACPPEVYAIMRGCWQREPQQRHSIKDVHARLQALAQAHHHH
5XXP , Knot 52 101 0.79 36 79 94
MEFRQLKYFIAVAEAGNMAAAAKRLHVSQPPITRQMQALEADLGVVLLERSHRGIELTAAGHAFLEDARRILELAGRSGDRSRAAARENLYFQGAHHHHHH
3EAY , Knot 141 323 0.84 40 189 311
ITSNPDEEWREVRHTGLVQKLIVYPPPPTKGGLGVTNEDLECLEEGEFLNDVIIDFYLKYLILEKASDELVERSHIFSSFFYKCLTRKENNLTEDNPNLSMAQRRHKRVRTWTRHINIFNKDYIFVPVNESSHWYLAVICFPWLEEAVYEDFPQTVSQQSQAQQSQNDNKTIDNDLRTTSTLSLSAEDSQSTESNMSVPKKMCKRPCILILDSLKAASVQNTVQNLREYLEVEWEVKLKTHRQFSKTNMVDLCPKVPKQDNSSDCGVYLLQYVESFFKDPIVNFELPIHLEKWFPRHVIKTKREDIRELILKLHLQQQKGSSS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4PMM_1)}(2) \setminus P_{f(5XXP_1)}(2)|=135\), \(|P_{f(5XXP_1)}(2) \setminus P_{f(4PMM_1)}(2)|=23\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:010010000111010110111101111000011100001111101100100010001000101101100001101111000101111110010010100110001101011111001111111110111110011111101111011000110000111011110110111000100000001110011110111100110001000001101111110110010011001000011001001001001010110101110101000100000100101010111010000
Pair \(Z_2\) Length of longest common subsequence
4PMM_1,5XXP_1 158 5
4PMM_1,3EAY_1 178 3
5XXP_1,3EAY_1 190 4

Newick tree

 
[
	3EAY_1:96.00,
	[
		4PMM_1:79,5XXP_1:79
	]:17.00
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{392 }{\log_{20} 392}-\frac{101}{\log_{20}101})=89.1\)
Status Protein1 Protein2 d d1/2
Query variables 4PMM_1 5XXP_1 113 74.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]