CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8HYK_1 4DAB_1 6ZEC_1 Letter Amino acid
15 16 10 D Aspartic acid
3 9 15 Q Glutamine
16 18 17 L Leucine
6 8 1 M Methionine
27 18 19 T Threonine
18 25 16 V Valine
6 20 12 A Alanine
9 9 7 R Arginine
42 7 9 N Asparagine
14 8 7 F Phenylalanine
2 3 5 C Cysteine
21 14 13 K Lycine
6 6 11 P Proline
31 21 30 S Serine
19 14 11 E Glutamic acid
12 22 12 G Glycine
10 12 3 H Histidine
24 13 6 I Isoleucine
6 0 3 W Tryptophan
20 10 13 Y Tyrosine

8HYK_1|Chain A|EfCdnE|Enterococcus faecalis EnGen0062 (1151187)
>4DAB_1|Chain A|Purine nucleoside phosphorylase deoD-type|Bacillus subtilis (1423)
>6ZEC_1|Chain A[auth L]|Fab fragment light chain|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8HYK , Knot 130 307 0.80 40 178 290
MSKFSESTLSGWTKPASVTEEDRIENTISMIKSAIKNDNNFDNLVYEVFVQGSYGNNTNVRTNSDIDVNIMLTSTFYSKYPEGKTNSDYGFTDGTITYNEYKNLILTALTNKFGTGNVTVGNKSIKITSNSYRVEADCIPSLLYRNYEYENSSSPNNYIEGIKYFASDNTSVVNYPKVHINNGIEKNNQTHKNYKRLVRVIKRLRNKMTAENHFTNENITSFLIECLIWNVPNNYINDYDTWDETIKQTLIFIKSSINDNSYKNWTEVSGMFYLFHNNRKWTSDDVSSFVNSLWSFMEYLEHHHHHH
4DAB , Knot 115 253 0.83 38 165 241
MGSSHHHHHHSSGLVPRGSHMSVHIGAEKGQIADTVLLPGDPLRAKFIAETYLENVECYNEVRGMYGFTGTYKGKKISVQGTGMGVPSISIYVNELIQSYDVQNLIRVGSCGAIRKDVKVRDVILAMTSSTDSQMNRVAFGSVDFAPCADFELLKNAYDAAKDKGVPVTVGSVFTADQFYNDDSQIEKLAKYGVLGVEMETTALYTLAAKHGRKALSILTVSDHVLTGEETTAEERQTTFHDMIDVALHSVSQ
6ZEC , Knot 103 220 0.84 40 158 211
DIVMTQSPDSLSVSLGERATINCKSSQSVLYSSHNKNYLAWYQQKPGQPPRLLIYWASTRESGVPDRFSGSGSGTDFTLTINTLQAEDVAVYYCQQYYTTPYTFGQGTKLEIKRTVAAPSVFIFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTLSKADYEKHKVYACEVTHQGLSSPVTKSFNRGEC

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8HYK_1)}(2) \setminus P_{f(4DAB_1)}(2)|=90\), \(|P_{f(4DAB_1)}(2) \setminus P_{f(8HYK_1)}(2)|=77\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1001000010110011010000010001011001100000100110011101001000010000010101110001000010100000011001010000000111011000110101011000101000000101001101100000000000100010110011000001100101010011000000000000110110010001010001000010011100111011000100000100010001111000100000001001011101100000100001001100110110010000000
Pair \(Z_2\) Length of longest common subsequence
8HYK_1,4DAB_1 167 6
8HYK_1,6ZEC_1 170 4
4DAB_1,6ZEC_1 175 3

Newick tree

 
[
	6ZEC_1:87.15,
	[
		8HYK_1:83.5,4DAB_1:83.5
	]:3.65
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{560 }{\log_{20} 560}-\frac{253}{\log_{20}253})=87.1\)
Status Protein1 Protein2 d d1/2
Query variables 8HYK_1 4DAB_1 109 98.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]