CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4IQK_1 3QIW_1 7WBF_1 Letter Amino acid
22 17 8 L Leucine
2 11 6 K Lycine
28 14 6 V Valine
6 2 3 Q Glutamine
11 10 6 I Isoleucine
4 16 3 F Phenylalanine
11 23 2 E Glutamic acid
10 6 1 H Histidine
21 8 10 S Serine
18 11 7 T Threonine
22 10 12 A Alanine
23 9 11 R Arginine
16 10 14 N Asparagine
15 15 7 D Aspartic acid
6 4 6 W Tryptophan
17 3 3 Y Tyrosine
8 2 8 C Cysteine
37 7 12 G Glycine
8 2 2 M Methionine
14 12 2 P Proline

4IQK_1|Chain A|Kelch-like ECH-associated protein 1|Homo sapiens (9606)
>3QIW_1|Chain A|H-2 CLASS II HISTOCOMPATIBILITY ANTIGEN, E-K alpha chain|Mus musculus (10090)
>7WBF_1|Chain A|Lysozyme C|Gallus gallus (9031)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4IQK , Knot 121 299 0.77 40 161 253
SGLVPRGSHMAPKVGRLIYTAGGYFRQSLSYLEAYNPSNGTWLRLADLQVPRSGLAGCVVGGLLYAVGGRNNSPDGNTDSSALDCYNPMTNQWSPCAPMSVPRNRIGVGVIDGHIYAVGGSHGCIHHNSVERYEPERDEWHLVAPMLTRRIGVGVAVLNRLLYAVGGFDGTNRLNSAECYYPERNEWRMITAMNTIRSGAGVCVLHNCIYAAGGYDGQDQLNSVERYDVATATWTFVAPMKHRRSALGITVHQGRIYVLGGYDGHTFLDSVECYDPDTDTWSEVTRMTSGRSGVGVAVT
3QIW , Knot 89 192 0.81 40 136 186
IKEEHTIIQAEFYLLPDKRGEFMFDFDGDEIFHVDIEKSETIWRLEEFAKFASFEAQGALANIAVDKANLDVMKERSNNTPDANVAPEVTVLSRSPVNLGEPNILICFIDKFSPPVVNVTWLRNGRPVTEGVSETVFLPRDDHLFRKFHYLTFLPSTDDFYDCEVDHWGLEEPLRKHWEFEEKTLLPETKEN
7WBF , Knot 66 129 0.82 40 104 127
KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4IQK_1)}(2) \setminus P_{f(3QIW_1)}(2)|=99\), \(|P_{f(3QIW_1)}(2) \setminus P_{f(4IQK_1)}(2)|=74\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:01111010011101101100111010001001010010010110110101100111101111110111100001010000011000011000101011101100011111101010111100101000010000100001011111100011111111001101111101000100100001000010110110010011110110001011110010001001000011010101111100000111101001010111100100110010000100001001001001001111110
Pair \(Z_2\) Length of longest common subsequence
4IQK_1,3QIW_1 173 4
4IQK_1,7WBF_1 171 3
3QIW_1,7WBF_1 166 3

Newick tree

 
[
	4IQK_1:86.97,
	[
		7WBF_1:83,3QIW_1:83
	]:3.97
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{491 }{\log_{20} 491}-\frac{192}{\log_{20}192})=87.0\)
Status Protein1 Protein2 d d1/2
Query variables 4IQK_1 3QIW_1 103 88
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]