CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5TCV_1 4HPW_1 2LCM_1 Letter Amino acid
22 16 1 D Aspartic acid
6 8 0 H Histidine
15 9 0 F Phenylalanine
10 7 2 T Threonine
29 31 6 L Leucine
15 11 0 M Methionine
3 0 0 W Tryptophan
24 19 2 V Valine
30 33 6 K Lycine
13 13 1 S Serine
11 13 0 Y Tyrosine
11 14 4 R Arginine
13 14 1 N Asparagine
4 2 0 C Cysteine
29 34 0 E Glutamic acid
20 21 0 G Glycine
23 21 0 A Alanine
9 5 0 Q Glutamine
15 27 3 I Isoleucine
17 12 2 P Proline

5TCV_1|Chain A|1-aminocyclopropane-1-carboxylate oxidase 1|Petunia hybrida (4102)
>4HPW_1|Chain A|Tyrosine--tRNA ligase|Methanocaldococcus jannaschii (243232)
>2LCM_1|Chain A|Voltage-dependent N-type calcium channel subunit alpha-1B|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5TCV , Knot 141 319 0.85 40 201 305
MENFPIISLDKVNGVERAATMEMIKDACENWGFFELVNHGIPREVMDTVEKMTKGHYKKCMEQRFKELVASKALEGVQAEVTDMDWESTFFLKHLPISNISEVPDLDEEYREVMRDFAKRLEKLAEELLDLLCENLGLEKGYLKNAFYGSKGPNFGTKVSNYPPCPKPDLIKGLRAHTDAGGIILLFQDDKVSGLQLLKDGQWIDVPPMRHSIVVNLGDQLEVITNGKYKSVMHRVIAQKDGARMSLASFYNPGSDAVIYPAPALVEKEAEENKQVYPKFVFDDYMKLYAGLKFQAKEPRFEAMKAMETDVKMDPIATV
4HPW , Knot 132 310 0.81 38 174 291
MDEFEMIKRNTSEIISEEELREVLKKDEKSAEIGFEPSGKIHLGHYLQIKKMIDLQNAGFDIIISLADLGAYLNQKGELDEIRKIGDYNKKVFEAMGLKAKYVYGSEFGLDKDYTLNVYRLALKTTLKRARRSMELIAREDENPKVAEVIYPIMQVNNIHYVGVDVAVGGMEQRKIHMLARELLPKKVVCIHNPVLTGLDGEGKMSSSKGNFIAVDDSPEEIRAKIKKAYCPAGVVEGNPIMEIAKYFLEYPLTIKRPEKFGGDLTVNSYEELESLFKNKELHPMDLKNAVAEELIKILEPIRKRLAAHH
2LCM , Knot 16 28 0.63 20 20 22
KDINTIKSLRVLRVLRPLKTIKRLPKLK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5TCV_1)}(2) \setminus P_{f(4HPW_1)}(2)|=90\), \(|P_{f(4HPW_1)}(2) \setminus P_{f(5TCV_1)}(2)|=63\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1001111010010110011010110010001111011001110011001001001000001000100111001101101010010100011100111001001101000000110011001001100110110001110010100110100110110010001101010110110100011111111000010110110010110111100011101100101100100001100111000110101101001100111011111100010000010101110001010111010100101011011000101011101
Pair \(Z_2\) Length of longest common subsequence
5TCV_1,4HPW_1 153 4
5TCV_1,2LCM_1 197 3
4HPW_1,2LCM_1 160 4

Newick tree

 
[
	2LCM_1:93.72,
	[
		5TCV_1:76.5,4HPW_1:76.5
	]:17.22
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{629 }{\log_{20} 629}-\frac{310}{\log_{20}310})=88.7\)
Status Protein1 Protein2 d d1/2
Query variables 5TCV_1 4HPW_1 111 108
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]