CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8GFJ_1 5QFO_1 5DPW_1 Letter Amino acid
14 11 5 M Methionine
3 6 0 W Tryptophan
35 9 6 Y Tyrosine
21 14 5 Q Glutamine
28 30 9 E Glutamic acid
15 14 2 G Glycine
56 23 8 K Lycine
42 18 6 I Isoleucine
42 25 8 S Serine
17 19 10 R Arginine
50 11 3 N Asparagine
14 11 0 H Histidine
33 14 8 F Phenylalanine
22 22 8 P Proline
21 13 8 T Threonine
13 17 10 V Valine
33 13 8 A Alanine
28 18 4 D Aspartic acid
2 4 0 C Cysteine
55 29 10 L Leucine

8GFJ_1|Chain A|Lytic transglycosylase domain-containing protein|Campylobacter jejuni (197)
>5QFO_1|Chain A|Tyrosine-protein phosphatase non-receptor type 1|Homo sapiens (9606)
>5DPW_1|Chains A, C, E, G, I, K, M, O|Microtubule-associated proteins 1A/1B light chain 3C|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8GFJ , Knot 210 544 0.81 40 235 497
MGSSHHHHHHSSGLVPRGSHMQYSIEKLKKEENSLAKDYYIYRLLEKNKISKKDAQDLNSHIFRYIGKIKSELEKIIPLKPYINPKYAKCYTYTANTILDANLTCQSVRLNSLVFIASLNSKDRTTLAQTFKNQRPDLTNLLLAFNTSDPMSYIVQKEDINGFFKLYNYSKKYDLDLNTSLVNKLPNHIGFKDFAQNIIIKKENPKFRHSMLEINPENVSEDSAFYLGVNALTYDKTELAYDFFKKAAQSFKSQSNKDNAIFWMWLIKNNEEDLKTLSQSSSLNIYSLYAKELTNTPFPKIESLNPSKKKNNFNMQDPFAWQKINKQIRDANASQLDVLAKEFDTQETLPIYAYILERKNNFKKHYFIMPYYDNIKDYNKTRQALILAIARQESRFIPTAISVSYALGMMQFMPFLANHIGEKELKIPNFDQDFMFKPEIAYYFGNYHLNYLESRLKSPLFVAYAYNGGIGFTNRMLARNDMFKTGKFEPFLSMELVPYQESRIYGKKVLANYIVYRHLLNDSIKISDIFENLIQNKANDLNKS
5QFO , Knot 141 321 0.84 40 205 309
MEMEKEFEQIDKSGSWAAIYQDIRHEASDFPSRVAKLPKNKNRNRYRDVSPFDHSRIKLHQEDNDYINASLIKMEEAQRSYILTQGPLPNTVGHFWEMVWEQKSRGVVMLNRVMEKGSLKCAQYWPQKEEKEMIFEDTNLKLTLISEDIKSYYTVRQLELENLTTQETREILHFHYTTWPDFGVPESPASFLNFLFKVRESGSLSPEHGPVVVHCSAGIGRSGTFCLADTCLLLMDKRKDPSSVDIKKVLLEMRKFRMGLIQTADQLRFSYLAVIEGAKFIMGDSSVQDQWKELSHEDLEPPPEHIPPPPRPPKRILEPHN
5DPW , Knot 59 118 0.79 34 98 114
PSVRPFKQRKSLAIRQEEVAGIRAKFPNKIPVVVERYPRETFLPPLDKTKFLVPQELTMTQFLSIIRSRMVLRATEAFYLLVNNKSLVSMSATMAEIYRDYKDEDGFVYMTYASQETF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8GFJ_1)}(2) \setminus P_{f(5QFO_1)}(2)|=93\), \(|P_{f(5QFO_1)}(2) \setminus P_{f(8GFJ_1)}(2)|=63\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100000000001111010010001001000000110000100110000100001001000110011010001001111010101001000000100110101000010100111110100000001100100001010011111000011001100001011101000000001010001100110011100110011100001010001101010010000110111011000000110011001100100000000111111110000001001000001010010100100011101001010000001010011110010001001010010111001000001110101100000100001111000010000000011111110000011101101001111101111110011000101101000111010110011000100100010011111010011111000111000110010101110101110000010100111001100011000101001100110001001000
Pair \(Z_2\) Length of longest common subsequence
8GFJ_1,5QFO_1 156 4
8GFJ_1,5DPW_1 185 4
5QFO_1,5DPW_1 179 4

Newick tree

 
[
	5DPW_1:94.95,
	[
		8GFJ_1:78,5QFO_1:78
	]:16.95
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{865 }{\log_{20} 865}-\frac{321}{\log_{20}321})=147.\)
Status Protein1 Protein2 d d1/2
Query variables 8GFJ_1 5QFO_1 183 146
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]