CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5MHA_1 4TZA_1 3HLU_1 Letter Amino acid
20 9 4 R Arginine
8 14 3 Y Tyrosine
10 13 6 I Isoleucine
3 5 5 Q Glutamine
29 22 8 E Glutamic acid
26 27 5 G Glycine
6 8 6 M Methionine
13 11 3 S Serine
19 14 6 T Threonine
3 2 0 W Tryptophan
37 14 8 A Alanine
5 6 7 N Asparagine
2 3 1 C Cysteine
2 19 7 K Lycine
12 11 3 F Phenylalanine
16 16 1 P Proline
36 11 11 V Valine
27 17 5 D Aspartic acid
10 13 0 H Histidine
24 16 7 L Leucine

5MHA_1|Chains A, B|D-2-hydroxyacid dehydrogenase|Haloferax mediterranei ATCC 33500 (523841)
>4TZA_1|Chains A, B, C, D|Fluorescent Protein|synthetic construct (32630)
>3HLU_1|Chains A, B|uncharacterized protein DUF2179|Eubacterium ventriosum (411463)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5MHA , Knot 128 308 0.79 40 167 288
MHIERLAVDESVGRAMPPQRFIEALSDLGVPVEFAGEDEQFGPGDAVASFGHRDAFLDADWVHCIRAGYDEFPVGVYEEAGTYLTNSTGIHGTTVGETVAGYMLTFARRLHAYRDAQHDHAWDLPRYEEPFTLAGERVCVVGLGTLGRGVVDRAAALGMEVVGVRRSGDPVDNVSTVYTPDRLHEAIADARFVVLATPLTDETEGMVAAPEFETMREDASLVNVARGPVVVESDLVAALDSGDIAGAALDVFSEEPLPEDSPLWDFEDVLITPHVSAATSKYHEDVAALIRENIEKIATGDELTNRVV
4TZA , Knot 113 251 0.83 40 173 239
MGAHASVIKPEMKIKLRMEGAVNGHKFVIEGEGIGKPYEGTQTLDLTVEEGAPLPFSYDILTPAFQYGNRAFTKYPEDIPDYFKQAFPEGYSWERSMTYEDQGICIATSDITMEGDCFFYEIRFDGTNFPPNGPVMQKKTLKWEPSTEKMYVEDGVLKGDVEMALLLEGGGHYRCDFKTTYKAKKDVRLPDAHEVDHRIEILSHDKDYNKVRLYEHAEARYSGGGSGGGASGKPIPNPLLGLDSTHHHHHH
3HLU , Knot 50 96 0.79 36 82 91
SNANGDQQTMVYIVSAKRKIIADRMLQELDLGVTMLQAVGAYKNNETEVIMCVMRKATLVKVRNLLKEVDPDAFMIVSTANEVFGEGFKNQYETEI

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5MHA_1)}(2) \setminus P_{f(4TZA_1)}(2)|=81\), \(|P_{f(4TZA_1)}(2) \setminus P_{f(5MHA_1)}(2)|=87\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10100111000110111100110110011111011100001111011101100011101011001011000111110001100100001101001100111011011001010001000011011000011011100101111101101110011111101111000101100100100100100111010111110110000011111101001000101101101111100011111001011111101100011100011101001110101011000000011111000100110100100011
Pair \(Z_2\) Length of longest common subsequence
5MHA_1,4TZA_1 168 3
5MHA_1,3HLU_1 169 4
4TZA_1,3HLU_1 177 3

Newick tree

 
[
	3HLU_1:87.34,
	[
		5MHA_1:84,4TZA_1:84
	]:3.34
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{559 }{\log_{20} 559}-\frac{251}{\log_{20}251})=87.4\)
Status Protein1 Protein2 d d1/2
Query variables 5MHA_1 4TZA_1 107 99
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]