CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4GMK_1 6KCX_1 7JVF_1 Letter Amino acid
9 25 2 A Alanine
21 29 0 D Aspartic acid
26 27 5 G Glycine
20 33 1 I Isoleucine
4 16 1 M Methionine
10 19 0 S Serine
2 12 1 W Tryptophan
5 24 0 Y Tyrosine
10 21 0 N Asparagine
21 39 3 L Leucine
11 17 2 T Threonine
23 32 1 V Valine
7 15 0 Q Glutamine
19 46 0 E Glutamic acid
4 18 0 H Histidine
6 17 0 F Phenylalanine
7 24 1 P Proline
6 31 1 R Arginine
0 3 2 C Cysteine
17 35 0 K Lycine

4GMK_1|Chains A, B|Ribose-5-phosphate isomerase A|Lactobacillus salivarius (362948)
>6KCX_1|Chain A|Alpha-glucosidase, putative|Thermotoga maritima (strain ATCC 43589 / MSB8 / DSM 3109 / JCM 10099) (243274)
>7JVF_1|Chain A|Prochlorosin 2.10|Prochlorococcus marinus str. MIT 9313 (74547)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4GMK , Knot 99 228 0.78 38 149 217
GPNQDELKQLVGTKAVEWIKDGMIVGLGTGSTVKYMVDALGKRVNEEGLDIVGVTTSIRTAEQAKSLGIVIKDIDEVDHIDLTIDGADEISSDFQGIKGGGAALLYEKIVATKSNKNMWIVDESKMVDDLGQFPLPVEVIPYGSGTVFKRFEEKGLNPEFRKNEDGSLLHTDSDNYIIDLHLGKIENPKELGDYLINQVGVVEHGLFLDIVNTVIVGRQDGPEVLEAR
6KCX , Knot 197 483 0.84 40 260 458
MGSDKIHHHHHHMKISIIGAGSVRFALQLVGDIAQTEELSREDTHIYMMDVHERRLNASYILARKYVEELNSPVKIVKTSSLDEAIDGADFIINTAYPYDPRYHDSGSQRWDEVTKVGEKHGYYRGIDSQELNMVSTYTYVLSSYPDMKLALEIAEKMKKMAPKAYLMQTANPVFEITQAVRRWTGANIVGFCHGVAGVYEVFEKLDLDPEEVDWQVAGVNHGIWLNRFRYRGEDAYPLLDEWIEKKLPEWEPKNPWDTQMSPAAMDMYKFYGMLPIGDTVRNGSWKYHYNLETKKKWFGKFGGIDNEVERPKFHEQLRRARERLIKLAEEVQQNPGMKLTEEHPEIFPKGKLSGEQHIPFINAIANNKRVRLFLNVENQGTLKDFPDDVVMELPVWVDCCGIHREKVEPDLTHRIKIFYLWPRILRMEWNLEAYISRDRKVLEEILIRDPRTKSYEQIVQVLDEIFNLPFNEELRRYYKEKL
7JVF , Knot 14 21 0.67 24 20 19
AGGTIPALMXGCGWLTGLCVR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4GMK_1)}(2) \setminus P_{f(6KCX_1)}(2)|=32\), \(|P_{f(6KCX_1)}(2) \setminus P_{f(4GMK_1)}(2)|=143\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110000100111001101100111111101001001101110010001101111000100100100111110010010010101011001000101101111111000111000000111100001100110111110111010101100100011010100000101100000001101011010010011001100111100111101100111100011011010
Pair \(Z_2\) Length of longest common subsequence
4GMK_1,6KCX_1 175 5
4GMK_1,7JVF_1 153 2
6KCX_1,7JVF_1 256 2

Newick tree

 
[
	6KCX_1:11.64,
	[
		4GMK_1:76.5,7JVF_1:76.5
	]:42.14
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{711 }{\log_{20} 711}-\frac{228}{\log_{20}228})=135.\)
Status Protein1 Protein2 d d1/2
Query variables 4GMK_1 6KCX_1 172 124
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]