CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2RUE_1 8AAZ_1 8YBN_1 Letter Amino acid
12 6 16 V Valine
8 7 24 D Aspartic acid
2 8 2 C Cysteine
10 12 9 G Glycine
1 6 1 W Tryptophan
4 3 12 F Phenylalanine
4 10 14 S Serine
6 7 13 T Threonine
14 12 5 A Alanine
2 3 8 Q Glutamine
9 2 12 E Glutamic acid
8 8 16 L Leucine
12 6 33 K Lycine
0 2 9 M Methionine
8 2 6 P Proline
7 3 21 Y Tyrosine
2 11 5 R Arginine
4 14 20 N Asparagine
1 1 5 H Histidine
7 6 9 I Isoleucine

2RUE_1|Chain A|Protein disulfide-isomerase|Humicola insolens (85995)
>8AAZ_1|Chain A|Lysozyme|Gallus gallus (9031)
>8YBN_1|Chain A|Enterotoxin type B|Staphylococcus aureus (1280)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2RUE , Knot 59 121 0.78 38 93 118
GPLGSEGPVTVVVAKNYNEIVLDDTKDVLIEFYAPWCGHCKALAPKYEELGALYAKSEFKDRVVIAKVDATANDVPDEIQGFPTIKLYPAGAKGQPVTYSGSRTVEDLIKFIAENGKYKAA
8AAZ , Knot 66 129 0.82 40 104 127
KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRL
8YBN , Knot 110 240 0.83 40 158 231
MESQPDPKPDELHKSSKFTGLMENMKVLYDDNHVSAINVKSIDQFLYFDLIYSIKDTKLGNYDNVRVEFKNKDLADKYKDKYVDVFGANYYYQCYFSKKTNDINSHQTDKRKTCMYGGVTEHNGNQLDKYRSITVRVFEDGKNLLSFDVQTNKKKVTAQELDYLTRHYLVKNKKLYEFNNSPYETGYIKFIENENSFWYDMMPAPGDKFDQSKYLMMYNDNKMVDSKDVKIEVYLTTKKK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2RUE_1)}(2) \setminus P_{f(8AAZ_1)}(2)|=69\), \(|P_{f(8AAZ_1)}(2) \setminus P_{f(2RUE_1)}(2)|=80\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1111001110111100000111000001110101110100011110000111101000100011110101010011001011101010111101011000100010011011100100011
Pair \(Z_2\) Length of longest common subsequence
2RUE_1,8AAZ_1 149 3
2RUE_1,8YBN_1 153 4
8AAZ_1,8YBN_1 174 3

Newick tree

 
[
	8YBN_1:84.24,
	[
		2RUE_1:74.5,8AAZ_1:74.5
	]:9.74
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{250 }{\log_{20} 250}-\frac{121}{\log_{20}121})=40.8\)
Status Protein1 Protein2 d d1/2
Query variables 2RUE_1 8AAZ_1 55 53
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]