CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6LIZ_1 4IVO_1 6END_1 Letter Amino acid
24 49 32 G Glycine
9 20 16 H Histidine
9 12 27 F Phenylalanine
36 48 38 A Alanine
6 28 41 E Glutamic acid
12 34 36 P Proline
12 45 45 S Serine
16 17 40 T Threonine
4 7 13 W Tryptophan
17 11 34 D Aspartic acid
12 25 26 Q Glutamine
17 70 69 L Leucine
8 5 10 M Methionine
15 37 38 V Valine
1 8 11 C Cysteine
13 13 31 I Isoleucine
8 8 40 K Lycine
4 6 22 Y Tyrosine
9 33 23 R Arginine
10 7 21 N Asparagine

6LIZ_1|Chain A|Metallo-beta-lactamase type 2|Klebsiella pneumoniae (573)
>4IVO_1|Chain A[auth B]|Protoporphyrinogen oxidase|Homo sapiens (9606)
>6END_1|Chain A|Leukotriene A-4 hydrolase|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6LIZ , Knot 111 242 0.84 40 165 234
GEIRPTIGQQMETGDQRFGDLVFRQLAPNVWQHTSYLDMPGFGAVASNGLIVRDGGRVLVVDTAWTDDQTAQILNWIKQEINLPVALAVVTHAHQDKMGGMDALHAAGIATYANALSNQLAPQEGMVAAQHSLTFAANGWVEPATAPNFGPLKVFYPGPGHTSDNITVGIDGTDIAFGGCLIKDSKAKSLGNLGDADTEHYAASARAFGAAFPKASMIVMSHSAPDSRAAITHTARMADKLR
4IVO , Knot 187 483 0.79 40 212 432
HHHHHHMGRTVVVLGGGISGLAASYHLSRAPCPPKVVLVESSERLGGWIRSVRGPNGAIFELGPQGIRPAGALGARTLLLVSELGLDSEVLPVRGDHPAAQNRFLYVGGALHALPTGLRGLLRPSPPFSKPLFWAGLRELTKPRGKEPDETVHSFAQRRLGPEVASLAMDSLCRGVFAGNSRELSIRSCFPSLFQAEQTHRSILLGLLLGAGRTPQPDSALIRQALAERWSQWSLRGGLEMLPQALETHLTSRGVSVLRGQPVCGLSLQAEGRWKVSLRDSSLEADHVISAIPASVLSELLPAEAAPLARALSAITAVSVAVVNLQYQGAHLPVQGFGHLVPSSEDPGVLGIVYDSVAFPEQDGSPPGLRVTVMLGGSWLQTLEASGCVLSQELFQQRAQEAAATQLGLKEMPSHCLVHLHKNCIPQYTLGHWQKLESARQFLTAHRLPLTLAGASYEGVAVNDCIESGRQAAVSVLGTEPNS
6END , Knot 246 613 0.85 40 285 572
GPGPEIVDTCSLASPASVCRTKHLHLRCSVDFTRRTLTGTAALTVQSQEDNLRSLVLDTKDLTIEKVVINGQEVKYALGERQSYKGSPMEISLPIALSKNQEIVIEISFETSPKSSALQWLTPEQTSGKEHPYLFSQCQAIHCRAILPCQDTPSVKLTYTAEVSVPKELVALMSAIRDGETPDPEDPSRKIYKFIQKVPIPCYLIALVVGALESRQIGPRTLVWSEKEQVEKSAYEFSETESMLKIAEDLGGPYVWGQYDLLVLPPSFPYGGMENPCLTFVTPTLLAGDKSLSNVIAHEISHSWTGNLVTNKTWDHFWLNEGHTVYLERHICGRLFGEKFRHFNALGGWGELQNSVKTFGETHPFTKLVVDLTDIDPDVAYSSVPYEKGFALLFYLEQLLGGPEIFLGFLKAYVEKFSYKSITTDDWKDFLYSYFKDKVDVLNQVDWNAWLYSPGLPPIKPNYDMTLTNACIALSQRWITAKEDDLNSFNATDLKDLSSHQLNEFLAQTLQRAPLPLGHIKRMQEVYNFNAINNSEIRFRWLRLCIQSKWEDAIPLALKMATEQGRMKFTRPLFKDLAAFDKSHDQAVRTYQEHKASMHPVTAMLVGKDLKVD

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6LIZ_1)}(2) \setminus P_{f(4IVO_1)}(2)|=61\), \(|P_{f(4IVO_1)}(2) \setminus P_{f(6LIZ_1)}(2)|=108\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10101011001001000110111001110110000010111111110011110011011110011000001011011000101111111100100001111011011111001011000111001111100010111011101101101111011011110000010111010011111011000010011011010000011010111111101011110001100011100010110010
Pair \(Z_2\) Length of longest common subsequence
6LIZ_1,4IVO_1 169 4
6LIZ_1,6END_1 176 4
4IVO_1,6END_1 159 4

Newick tree

 
[
	6LIZ_1:88.40,
	[
		4IVO_1:79.5,6END_1:79.5
	]:8.90
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{725 }{\log_{20} 725}-\frac{242}{\log_{20}242})=134.\)
Status Protein1 Protein2 d d1/2
Query variables 6LIZ_1 4IVO_1 164 123.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]