CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6MLK_1 8OKQ_1 2FJR_1 Letter Amino acid
30 22 17 G Glycine
8 2 1 M Methionine
40 13 15 A Alanine
8 11 6 Q Glutamine
28 26 19 L Leucine
19 7 9 R Arginine
3 24 12 K Lycine
8 12 9 F Phenylalanine
21 17 3 P Proline
18 18 18 S Serine
17 12 8 T Threonine
6 7 3 W Tryptophan
21 17 12 V Valine
2 10 7 N Asparagine
25 19 14 D Aspartic acid
2 1 3 C Cysteine
16 13 12 E Glutamic acid
15 12 3 H Histidine
7 9 13 I Isoleucine
6 8 5 Y Tyrosine

6MLK_1|Chain A|6-deoxyerythronolide-B synthase EryA3, modules 5 and 6|Saccharopolyspora erythraea (1836)
>8OKQ_1|Chain A|Carbonic anhydrase 2|Homo sapiens (9606)
>2FJR_1|Chains A, B|Repressor protein CI|Enterobacteria phage 186 (29252)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6MLK , Knot 126 300 0.79 40 168 280
MASQLDSGTPAREASSALRDGYRQAGVSGRVRSYLDLLAGLSDFREHFDGSDGFSLDLVDMADGPGEVTVICCAGTAAISGPHEFTRLAGALRGIAPVRAVPQPGYEEGEPLPSSMAAVAAVQADAVIRTQGDKPFVVAGHSAGALMAYALATELLDRGHPPRGVVLIDVYPPGHQDAMNAWLEELTATLFDRETVRMDDTRLTALGAYDRLTGQWRPRETGLPTLLVSAGEPMGPWPDDSWKPTWPFEHDTVAVPGDHFTMVQEHADAIARHIDAWLGGGNSSSVDKLAAALEHHHHHH
8OKQ , Knot 112 260 0.79 40 176 249
MSHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRILNNGHAFNVEFDDSQDKAVLKGGPLDGTYRLIQFHFHWGSLDGQGSEHTVDKKKYAAELHLVHWNTKYGDFGKAVQQPDGLAVLGIFLKVGSAKPGLQKVVDVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSLTTPPLLECVTWIVLKEPISVSSEQVLKFRKLNFNGEGEPEELMVDNWRPAQPLKNRQIKASFK
2FJR , Knot 88 189 0.81 40 135 182
DSLGWSNVDVLDRICEAYGFSQKIQLANHFDIASSSLSNRYTRGAISYDFAAHCALETGANLQWLLTGEGEAFVNNRESSDAKRIEGFTLSEEILKSDKQLSVDAQFFTKPLTDGMAIRSEGKIYFVDKQASLSDGLWLVDIKGAISIRELTKLPGRKLHVAGGKVPFECGIDDIKTLGRVVGVYSEVN

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6MLK_1)}(2) \setminus P_{f(8OKQ_1)}(2)|=78\), \(|P_{f(8OKQ_1)}(2) \setminus P_{f(6MLK_1)}(2)|=86\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110010010110010011001000111010100010111110010001010011010110110111010110011011101100100111110111110111011000101110011111110101110001001111110011111101110011001011011111010111000110111001010110000101000010111100010101010001110111011011111100010101110000111110010110001011100101111110000100111110000000
Pair \(Z_2\) Length of longest common subsequence
6MLK_1,8OKQ_1 164 4
6MLK_1,2FJR_1 145 3
8OKQ_1,2FJR_1 163 5

Newick tree

 
[
	8OKQ_1:84.60,
	[
		6MLK_1:72.5,2FJR_1:72.5
	]:12.10
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{560 }{\log_{20} 560}-\frac{260}{\log_{20}260})=85.0\)
Status Protein1 Protein2 d d1/2
Query variables 6MLK_1 8OKQ_1 106 100
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]