CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1OUW_1 8PCF_1 8AGP_1 Letter Amino acid
2 2 15 H Histidine
6 10 24 K Lycine
7 4 16 F Phenylalanine
15 29 10 T Threonine
26 22 14 G Glycine
13 11 22 I Isoleucine
12 13 12 S Serine
2 3 8 W Tryptophan
4 15 12 R Arginine
7 13 14 D Aspartic acid
0 1 1 C Cysteine
3 11 22 E Glutamic acid
11 15 24 V Valine
6 35 18 A Alanine
2 18 8 Q Glutamine
6 27 33 L Leucine
7 13 14 P Proline
14 11 14 N Asparagine
2 5 6 M Methionine
7 5 12 Y Tyrosine

1OUW_1|Chains A, B, C, D|lectin|Calystegia sepium (47519)
>8PCF_1|Chain A|Beta-lactamase|Klebsiella pneumoniae (573)
>8AGP_1|Chains A[auth AAA], B[auth BBB]|Alpha/beta epoxide hydrolase|metagenome (256318)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1OUW , Knot 68 152 0.75 38 107 140
AVPMDTISGPWGNNGGNFWSFRPVNKINQIVISYGGGGNNPIALTFSSTKADGSKDTITVGGGGPDSITGTEMVNIGTDEYLTGISGTFGIYLDNNVLRSITFTTNLKAHGPYGQKVGTPFSSANVVGNEIVGFLGRSGYYVDAIGTYNRHK
8PCF , Knot 117 263 0.82 40 163 250
QTSAVQQKLAALEKSSGGRLGVALIDTADNTQVLYRGDERFPMCSTSKVMAAAAVLKQSETQKQLLNQPVEIKPADLVNYNPIAEKHVNGTMTLAELSAAALQYSDNTAMNKLIAQLGGPGGVTAFARAIGDETFRLDRTEPTLNTAIPGDPRDTTTPRAMAQTLRQLTLGHALGETQRAQLVTWLKGNTTGAASIRAGLPTSWTVGDKTGSGDYGTTNDIAVIWPQGRAPLVLVTYFTQPQQNAESRRDVLASAARIIAEGL
8AGP , Knot 132 299 0.84 40 188 287
MNEMLKHEYVKVNGIKMHYVTQGKGKLLLLLHGFPDFWYVWRFQIPALAKHFRVVAPDLRGYNETDKPEGVENYRLDLLAKDILGLIKALGEEHAVVVGHDWGGIISWTLTAFNPQAVEKLVILNAPHPKAYMTRTKNSLRQLQKSWYVFFFQVANIPEKILSRNEFAFLKNMLIQSFVRRDLLTEEDLRIYVDAWSKSGALTSALNYYRANLNPDIIFSEKTVVFPKIKVPTLVIWGEKDVAISKDLIVNMEDFIEAPYSIKYFPECGHWVQLEEPELVRKHIEEFILKSDIHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1OUW_1)}(2) \setminus P_{f(8PCF_1)}(2)|=55\), \(|P_{f(8PCF_1)}(2) \setminus P_{f(1OUW_1)}(2)|=111\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11110010111100110110101100100111001111001111010000101000010111111001010011011000010110101110100011001010001010110100110110010111001111110010010111000000
Pair \(Z_2\) Length of longest common subsequence
1OUW_1,8PCF_1 166 4
1OUW_1,8AGP_1 181 4
8PCF_1,8AGP_1 157 4

Newick tree

 
[
	1OUW_1:89.43,
	[
		8PCF_1:78.5,8AGP_1:78.5
	]:10.93
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{415 }{\log_{20} 415}-\frac{152}{\log_{20}152})=78.6\)
Status Protein1 Protein2 d d1/2
Query variables 1OUW_1 8PCF_1 102 78.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]