CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3IQE_1 3PGD_1 1RYK_1 Letter Amino acid
14 8 4 R Arginine
3 3 6 Q Glutamine
17 4 2 M Methionine
6 14 3 T Threonine
35 9 2 A Alanine
3 2 0 C Cysteine
22 16 2 L Leucine
0 4 4 W Tryptophan
26 17 4 V Valine
8 10 3 N Asparagine
37 21 8 E Glutamic acid
14 7 7 G Glycine
2 6 0 H Histidine
10 15 1 F Phenylalanine
24 12 7 D Aspartic acid
13 9 3 I Isoleucine
18 9 10 K Lycine
16 16 0 P Proline
9 7 0 S Serine
6 4 3 Y Tyrosine

3IQE_1|Chains A, B, C, D, E, F|F420-dependent methylenetetrahydromethanopterin dehydrogenase|Methanopyrus kandleri (2320)
>3PGD_1|Chains A, D|HLA class II histocompatibility antigen, DR alpha chain|Homo sapiens (9606)
>1RYK_1|Chain A|Protein yjbJ|Escherichia coli (562)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3IQE , Knot 121 283 0.80 38 164 270
MTVAKAIFIKCGNLGTSMMMDMLLDERADREDVEFRVVGTSVKMDPECVEAAVEMALDIAEDFEPDFIVYGGPNPAAPGPSKAREMLADSEYPAVIIGDAPGLKVKDEMEEQGLGYILVKPDAMLGARREFLDPVEMAIYNADLMKVLAATGVFRVVQEAFDELIEKAKEDEISENDLPKLVIDRNTLLEREEFENPYAMVKAMAALEIAENVADVSVEGCFVEQDKERYVPIVASAHEMMRKAAELADEARELEKSNDAVLRTPHAPDGKVLSKRKFMEDPE
3PGD , Knot 91 193 0.82 40 144 189
MIKEEHVIIQAEFYLNPDQSGEFMFDFDGDEIFHVDMAKKETVWRLEEFGRFASFEAQGALANIAVDKANLEIMTKRSNYTPITNVPPEVTVLTNSPVELREPNVLICFIDKFTPPVVNVTWLRNGKPVTTGVSETVFLPREDHLFRKFHYLPFLPSTEDVYDCRVEHWGLDEPLLKHWEFDAPSPLPETTEN
1RYK , Knot 38 69 0.77 32 61 67
MNKDEAGGNWKQFKGKVKEQWGKLTDDDMTIIEGKRDQLVGKIQERYGYQKDQAEKEVVDWETRNEYRW

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3IQE_1)}(2) \setminus P_{f(3PGD_1)}(2)|=84\), \(|P_{f(3PGD_1)}(2) \setminus P_{f(3IQE_1)}(2)|=64\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1011011110010110011101110001000010101110010101001011101110110010101110111011111100100111000011111101111010001000111011101011111000110110111001011011110111011001100110010000100001101110000110000100101110111110110011010101011000000011111010011001101100100100000111001011010110000110010
Pair \(Z_2\) Length of longest common subsequence
3IQE_1,3PGD_1 148 3
3IQE_1,1RYK_1 167 3
3PGD_1,1RYK_1 165 3

Newick tree

 
[
	1RYK_1:85.79,
	[
		3IQE_1:74,3PGD_1:74
	]:11.79
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{476 }{\log_{20} 476}-\frac{193}{\log_{20}193})=82.5\)
Status Protein1 Protein2 d d1/2
Query variables 3IQE_1 3PGD_1 102 86
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]