CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1LEN_1 2HCS_1 4BWG_1 Letter Amino acid
2 15 4 H Histidine
24 39 21 T Threonine
4 49 19 R Arginine
0 9 2 C Cysteine
12 46 29 G Glycine
10 23 26 I Isoleucine
13 17 12 F Phenylalanine
3 24 6 W Tryptophan
13 41 36 A Alanine
11 34 15 D Aspartic acid
6 15 9 Q Glutamine
0 21 8 M Methionine
14 28 33 S Serine
7 13 10 Y Tyrosine
5 51 16 E Glutamic acid
8 46 26 L Leucine
11 34 19 K Lycine
15 23 13 N Asparagine
7 21 16 P Proline
16 46 27 V Valine

1LEN_1|Chains A, C|LECTIN|Lens culinaris (3864)
>2HCS_1|Chain A|RNA-directed RNA polymerase (NS5)|Kunjin virus (11077)
>4BWG_1|Chains A, G|SUBA|ESCHERICHIA COLI (670902)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1LEN , Knot 81 181 0.77 36 124 174
TETTSFSITKFSPDQQNLIFQGDGYTTKGKLTLTKAVKSTVGRALYSTPIHIWDRDTGNVANFVTSFTFVIDAPSSYNVADGFTFFIAPVDTKPQTGGGYLGVFNSKEYDKTSQTVAVEFDTFYNAAWDPSNKERHIGIDVNSIKSVNTKSWNLQNGERANVVIAFNAATNVLTVTLTYPN
2HCS , Knot 239 595 0.85 40 279 551
HHHHHHKSASSLVNGVVRLLSKPWDTITNVTTMAMTDTTPFGQQRVFKEKVDTKAPEPPEGVKYVLNETTNWLWAFLAREKRPRMCSREEFIRKVNSNAALGAMFEEQNQWRSAREAVEDPKFWEMVDEEREAHLRGECHTCIYNMMGKREKKPGEFGKAKGSRAIWFMWLGARFLEFEALGFLNEDHWLGRKNSGGGVEGLGLQKLGYILREVGTRPGGRIYADDTAGWDTRITRADLENEAKVLELLDGEHRRLARAIIELTYRHKVVKVMRPAADGRTVMDVISREDQRGSGQVVTYALNTFTNLAVQLVRMMEGEGVIGPDDVEKLTKGKGPKVRTWLSENGEERLSRMAVSGDDCVVKPLDDRFATSLHFLNAMSKVRKDIQEWKPSTGWYDWQQVPFCSNHFTELIMKDGRTLVTPCRGQDELVGRARISPGAGWNVRDTACLAKSYAQMWLLLYFHRRDLRLMANAICSAVPVNWVPTGRTTWSIHAGGEWMTTEDMLEVWNRVWIEENEWMEDKTPVEKWSDVPYSGKREDIWCGSLIGTRARATWAENIQVAINQVRSIIGDEKYVDYMSSLKRYEDTTLVEDTVL
4BWG , Knot 146 347 0.82 40 196 333
MLKILWTYILFLLFISASARAEKPWYFDAIGLTETTMSLTDKNTPVVVSVVDSGVAFIGGLSDSEFAKFSFTQDGSPFPVKKSEALYIHGTAMASLIASRYGIYGVYPHALISSRRVIPDGVQDSWIRAIESIMSNVFLAPGEEKIINISGGQKGVASASVWTELLSRMGRNNDRLIVAAVGNDGADIRKLSAQQRIWPAAYHPVSSVNKKQDPVIRVAALAQYRKGETPVLHGGGITGSRFGNNWVDIAAPGQNITFLRPDAKTGTGSGTSEATAIVSGVLAAMTSCNPRATATELKRTLLESADKYPSLVDKVTEGRVLNAEKAISMFCKKNYIPVRQGRMSEEL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1LEN_1)}(2) \setminus P_{f(2HCS_1)}(2)|=31\), \(|P_{f(2HCS_1)}(2) \setminus P_{f(1LEN_1)}(2)|=186\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0000010100101000011101010000101010011000110110001101100001011011001011101100001101101111110001001110111100000000000111010010011101000000111010010010000101001001011111011001101010010
Pair \(Z_2\) Length of longest common subsequence
1LEN_1,2HCS_1 217 4
1LEN_1,4BWG_1 168 4
2HCS_1,4BWG_1 165 4

Newick tree

 
[
	1LEN_1:10.40,
	[
		4BWG_1:82.5,2HCS_1:82.5
	]:18.90
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{776 }{\log_{20} 776}-\frac{181}{\log_{20}181})=166.\)
Status Protein1 Protein2 d d1/2
Query variables 1LEN_1 2HCS_1 214 138
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]