CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4UXV_1 5MXD_1 7HWG_1 Letter Amino acid
26 21 9 R Arginine
1 6 6 C Cysteine
86 27 7 E Glutamic acid
9 20 3 F Phenylalanine
17 27 6 T Threonine
19 19 8 Y Tyrosine
41 24 5 I Isoleucine
51 15 2 K Lycine
8 24 4 P Proline
35 31 14 S Serine
4 7 3 W Tryptophan
16 14 6 H Histidine
61 35 12 L Leucine
42 25 10 A Alanine
19 15 5 N Asparagine
28 22 9 D Aspartic acid
28 17 5 Q Glutamine
19 40 14 G Glycine
9 9 2 M Methionine
26 34 14 V Valine

4UXV_1|Chain A|SEPTATION RING FORMATION REGULATOR EZRA|BACILLUS SUBTILIS SUBSP. SUBTILIS STR. 168 (224308)
>5MXD_1|Chains A, B, C|Beta-secretase 1|Homo sapiens (9606)
>7HWG_1|Chains A, B[auth C]|Protease 2A|Coxsackievirus A16 (31704)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4UXV , Knot 208 545 0.80 40 204 484
GSHMRKKIYAEIDRLESWKIEILNRSIVEEMSKIKHLKMTGQTEEFFEKWREEWDEIVTAHMPKVEELLYDAEENADKYRFKKANQVLVHIDDLLTAAESSIEKILREISDLVTSEEKSREEIEQVRERYSKSRKNLLAYSHLYGELYDSLEKDLDEIWSGIKQFEEETEGGNYITARKVLLEQDRNLERLQSYIDDVPKLLADCKQTVPGQIAKLKDGYGEMKEKGYKLEHIQLDKELENLSNQLKRAEHVLMTELDIDEASAILQLIDENIQSVYQQLEGEVEAGQSVLSKMPELIIAYDKLKEEKEHTKAETELVKESYRLTAGELGKQQAFEKRLDEIGKLLSSVKDKLDAEHVAYSLLVEEVASIEKQIEEVKKEHAEYRENLQALRKEELQARETLSNLKKTISETARLLKTSNIPGIPSHIQEMLENAHHHIQETVNQLNELPLNMEEAGAHLKQAEDIVNRASRESEELVEQVILIEKIIQFGNRFRSQNHILSEQLKEAERRFYAFDYDDSYEIAAAAVEKAAPGAVEKIKADISA
5MXD , Knot 182 432 0.85 40 236 406
MHHHHHHTQHGIRLPLRSGLGGAPLGLRLPRETDEEPEEPGTTGSFVEMVDNLRGKSGQGYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQRQLSSTYRDLRKGVYVPYTQGKWEGELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNLFSLQLCGAGFPLNQSEVLASVGGSMIIGGIDHSLYTGSLWYTPIRREWYYEVIIVRVEINGQDLKMDCKEYNYDKSIVDSGTTNLRLPKKVFEAAVKSIKAASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLMGEVTNQSFRITILPQQYLRPVEDVATSQDDCYKFAISQSSTGTVMGAVIMEGFYVVFDRARKRIGFAVSACHVHDEFRTAAVEGPFVTLDMEDCGYN
7HWG , Knot 70 144 0.80 40 112 139
SGAIYVGNYRVVNRHLATHNDWANLVWEDSSRDLLVSSTTAQGCDTIARCDCQTGVYYCSSRRKHYPVSFSKPSLIFVEASEYYPARYQSHLMLAVGHSEPGDCGGILRCQHGVVGIVSTGGNGLVGFADVRDLLWLDEEAMEQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4UXV_1)}(2) \setminus P_{f(5MXD_1)}(2)|=64\), \(|P_{f(5MXD_1)}(2) \setminus P_{f(4UXV_1)}(2)|=96\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10010001010100100101011000110010010010101000011001000100110101101001100100010000100100111010011011000100110010011000000000100100000000001110001010100010001001101100100000110010100111000001001000100110111000001110110100101010001001001010001001000100100111001010010111011000100100010101011001100110111100010000000010001100000101101100011000100110110010001010011001110011010001001000010000010110000101000100100010001011000011111001001100100010001001001110100111010010011001000000110011110011011001000001100010010001011000000011111100111111001010101
Pair \(Z_2\) Length of longest common subsequence
4UXV_1,5MXD_1 160 4
4UXV_1,7HWG_1 194 3
5MXD_1,7HWG_1 192 3

Newick tree

 
[
	7HWG_1:10.40,
	[
		4UXV_1:80,5MXD_1:80
	]:21.40
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{977 }{\log_{20} 977}-\frac{432}{\log_{20}432})=144.\)
Status Protein1 Protein2 d d1/2
Query variables 4UXV_1 5MXD_1 179 164
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]