CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7OMJ_1 4HLS_1 8VJP_1 Letter Amino acid
9 8 14 R Arginine
18 5 11 D Aspartic acid
13 10 6 Q Glutamine
23 7 11 E Glutamic acid
2 10 5 H Histidine
18 9 11 S Serine
38 6 8 A Alanine
0 2 1 C Cysteine
25 5 9 I Isoleucine
15 3 10 K Lycine
9 3 7 F Phenylalanine
7 4 2 P Proline
37 10 12 G Glycine
19 4 13 L Leucine
17 9 10 T Threonine
0 11 2 Y Tyrosine
16 8 4 N Asparagine
13 7 4 M Methionine
0 0 3 W Tryptophan
30 11 13 V Valine

7OMJ_1|Chain A|Cell division protein FtsZ|Staphylococcus aureus (1280)
>4HLS_1|Chains A, B|Major prion protein|Oryctolagus cuniculus (9986)
>8VJP_1|Chain A|Induced myeloid leukemia cell differentiation protein Mcl-1|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7OMJ , Knot 134 309 0.82 34 174 292
GSHMATLKVIGVGGGGNNAVNRMIDHGMNNVEFIAINTDGQALNLSKAESKIQIGEKLTRGLGAGANPEIGKKAAEESREQIEDAIQGADMVFVTSGMGGGTGTGAAPVVAKIAKEMGALTVGVVTRPFSFEGRKRQTQAAAGVEAMKAAVDTLIVIPNDRLLDIVDKSTPMMEAFKEADNVLRQGVQGISDLIAVSGEVNLDFADVKTIMSNQGSALMGIGVSSGENRAVEAAKKAISSPLLETSIVGAQGVLMNITGGESLSLFEAQEAADIVQDAADEDVNMIFGTVINPELQDEIVVTVIATGFD
4HLS , Knot 64 132 0.79 38 105 125
MGSSHHHHHHSSGLVPRGSHMAVVGGLGGYMLGSAMSRPLIHFGNDYEDRYYRENMYRYPNQVYYRPVDQYNNQNSFVHDCVNITVKQHTVTTTTKGENFTETDIKIMERVVEQMCITQYQQESQAAYQRAA
8VJP , Knot 80 156 0.86 40 124 153
GSHMDELYRQSLEIISRYLREQATGAKDTKPMGRSGATSRKALETLRRVGDGVQRNHETAFQGMLRKLDIKNEDDVKSLSRVMIHVFSDGVTNWGRIVTLISFGAFVAKHLKTINQESCIEPLAESITDVLVRTKRDWLVKQRGWDGFVEFFHVED

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7OMJ_1)}(2) \setminus P_{f(4HLS_1)}(2)|=117\), \(|P_{f(4HLS_1)}(2) \setminus P_{f(7OMJ_1)}(2)|=48\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100110101111111100110011001100101111000101101001000101100100111111010110011000000100110110111100111110101111111011001111011110011010100000011111011011100111110001101100001110110010011001101100111101010101101001100010111111100100011011001100111000111101111010110010110100110110011000101111011010100011101110110
Pair \(Z_2\) Length of longest common subsequence
7OMJ_1,4HLS_1 165 5
7OMJ_1,8VJP_1 148 4
4HLS_1,8VJP_1 153 4

Newick tree

 
[
	4HLS_1:81.32,
	[
		7OMJ_1:74,8VJP_1:74
	]:7.32
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{441 }{\log_{20} 441}-\frac{132}{\log_{20}132})=92.4\)
Status Protein1 Protein2 d d1/2
Query variables 7OMJ_1 4HLS_1 115 80.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]