CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3HBR_1 6IRJ_1 4OGM_1 Letter Amino acid
13 11 5 R Arginine
14 7 35 D Aspartic acid
19 6 33 I Isoleucine
9 2 9 M Methionine
7 2 24 P Proline
16 10 25 S Serine
21 12 63 A Alanine
0 8 0 C Cysteine
14 3 13 Q Glutamine
7 1 9 H Histidine
12 3 16 F Phenylalanine
6 3 22 Y Tyrosine
16 12 42 G Glycine
20 8 44 L Leucine
12 7 31 T Threonine
10 6 8 W Tryptophan
20 6 30 V Valine
16 14 29 N Asparagine
15 2 33 E Glutamic acid
18 6 49 K Lycine

3HBR_1|Chains A, B, C, D|OXA-48|Klebsiella pneumoniae (573)
>6IRJ_1|Chain A|Lysozyme C|Gallus gallus (9031)
>4OGM_1|Chain A|Maltose ABC transporter periplasmic protein, pilin protein chimera|Escherichia coli (562)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3HBR , Knot 121 265 0.85 38 175 261
MRVLALSAVFLVASIIGMPAVAKEWQENKSWNAHFTEHKSQGVVVLWNENKQQGFTNNLKRANQAFLPASTFKIPNSLIALDLGVVKDEHQVFKWDGQTRDIATWNRDHNLITAMKYSVVPVYQEFARQIGEARMSKMLHAFDYGNEDISGNVDSFWLDGGIRISATEQISFLRKLYHNKLHVSERSQRIVKQAMLTEANGDYIIRAKTGYSTRIEPKIGWWVGWVELDDNVWFFAMNMDMPTSDGLGLRQAITKEVLKQEKIIP
6IRJ , Knot 66 129 0.82 40 104 127
KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRL
4OGM , Knot 201 520 0.80 38 222 465
MKIEEGKLVIWINGDKGYNGLAEVGKKFEKDTGIKVTVEHPDKLEEKFPQVAATGDGPDIIFWAHDRFGGYAQSGLLAEITPAAAFQDKLYPFTWDAVRYNGKLIAYPIAVEALSLIYNKDLLPNPPKTWEEIPALDKELKAKGKSALMFNLQEPYFTWPLIAADGGYAFKYAAGKYDIKDVGVDNAGAKAGLTFLVDLIKNKHMNADTDYSIAEAAFNKGETAMTINGPWAWSNIDTSAVNYGVTVLPTFKGQPSKPFVGVLSAGINAASPNKELAKEFLENYLLTDEGLEAVNKDKPLGAVALKSYEEELAKDPRIAATMENAQKGEIMPNIPQMSAFWYAVRTAVINAASGRQTVDAALAAAQTNAAASNINKAKVASVESDYSSIKSAALSYYSDTNKIPVTPDGQTGLNVLETYMESLPDKADIGGEYKLIKVGNKLVLQIGKDGEGVTLTEAQSAKLLSDIGKDKIYTGVTGDNFGEQLKDTTKIDNKALYIVLIDNTVMDSTKGSLEHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3HBR_1)}(2) \setminus P_{f(6IRJ_1)}(2)|=127\), \(|P_{f(6IRJ_1)}(2) \setminus P_{f(3HBR_1)}(2)|=56\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1011110111111011111111001000001010100000011111100000011000100100111110010110011110111100000110101000011010000011011000111100011001101010011011001000101010011101110101000101100100001010000001100111001010011010010000101011111111010001111110101100011110011000110000111
Pair \(Z_2\) Length of longest common subsequence
3HBR_1,6IRJ_1 183 3
3HBR_1,4OGM_1 183 4
6IRJ_1,4OGM_1 196 4

Newick tree

 
[
	4OGM_1:95.88,
	[
		3HBR_1:91.5,6IRJ_1:91.5
	]:4.38
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{394 }{\log_{20} 394}-\frac{129}{\log_{20}129})=80.2\)
Status Protein1 Protein2 d d1/2
Query variables 3HBR_1 6IRJ_1 107 79.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]