CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2SPZ_1 8ERZ_1 5HJY_1 Letter Amino acid
5 5 21 E Glutamic acid
3 0 30 F Phenylalanine
1 3 17 Y Tyrosine
1 0 24 R Arginine
8 0 20 N Asparagine
1 0 19 H Histidine
2 0 22 I Isoleucine
6 7 23 K Lycine
3 0 22 S Serine
0 0 5 W Tryptophan
1 0 23 V Valine
4 0 31 D Aspartic acid
0 2 49 G Glycine
6 0 16 Q Glutamine
0 0 17 M Methionine
3 5 23 P Proline
7 8 59 A Alanine
0 0 3 C Cysteine
7 3 31 L Leucine
0 1 26 T Threonine

2SPZ_1|Chain A|IMMUNOGLOBULIN G BINDING PROTEIN A|Staphylococcus aureus (1280)
>8ERZ_1|Chain A|Designed miniprotein oPPalpha: Aib10Gly11 turn|Streptococcus mutans (1309)
>5HJY_1|Chains A, B, C, D, E, F|Ribulose bisphosphate carboxylase|Rhodopseudomonas palustris (1076)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2SPZ , Knot 31 58 0.72 30 50 55
VDNKFNKEQQNAFYEILHLPNLNEEQRNAFIQSLKDDPSQSANLLAEAKKLNDAQAPK
8ERZ , Knot 13 36 0.43 18 20 23
XPPKKPKKPGAGATPEKLAAYEKELAAYEKELAAYX
5HJY , Knot 195 481 0.83 40 245 452
MGSSHHHHHHSSGLVPRGSHMDQSNRYANLNLKESELIAGGRHVLCAYIMKPKAGFGNFIQTAAHFAAESSTGTNVEVSTTDDFTRGVDALVYEVDEANSLMKIAYPIELFDRNVIDGRAMIASFLTLTIGNNQGMGDVEYAKMYDFYVPPAYLKLFDGPSTTIKDLWRVLGRPVINGGFIVGTTIKPKLGLRPQPFANACYDFWLGGDFIKNDEPQGNQVFAPFKDTVRAVADAMRRAQDKTGEAKLFSFNITADDHYEMLARGEFILETFADNADHIAFLVDGYVAGPAAVTTARRAFPKQYLHYHRAGHGAVTSPQSKRGYTAFVLSKMARLQGASGIHTGTMGFGKMEGEAADRAIAYMITEDAADGPYFHQEWLGMNPTTPIISGGMNALRMPGFFDNLGHSNLIMTAGGGAFGHVDGGAAGAKSLRQAEQCWKQGADPVEFAKDHREFARAFESFPQDADKLYPNWRAKLKPQAA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2SPZ_1)}(2) \setminus P_{f(8ERZ_1)}(2)|=44\), \(|P_{f(8ERZ_1)}(2) \setminus P_{f(2SPZ_1)}(2)|=14\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1000100000011001101101000000111001000100010111010010010110
Pair \(Z_2\) Length of longest common subsequence
2SPZ_1,8ERZ_1 58 2
2SPZ_1,5HJY_1 223 3
8ERZ_1,5HJY_1 237 3

Newick tree

 
[
	5HJY_1:13.79,
	[
		2SPZ_1:29,8ERZ_1:29
	]:10.79
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{94 }{\log_{20} 94}-\frac{36}{\log_{20}36})=21.6\)
Status Protein1 Protein2 d d1/2
Query variables 2SPZ_1 8ERZ_1 27 18.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: