CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1TLF_1 3VJQ_1 8GAD_1 Letter Amino acid
38 9 14 L Leucine
16 20 4 T Threonine
35 16 12 A Alanine
16 13 4 D Aspartic acid
11 6 20 E Glutamic acid
2 3 0 W Tryptophan
4 8 1 Y Tyrosine
24 4 1 Q Glutamine
8 1 2 M Methionine
12 12 2 P Proline
6 0 0 H Histidine
7 11 9 K Lycine
28 14 5 S Serine
25 10 5 V Valine
16 12 6 R Arginine
9 9 3 N Asparagine
20 24 5 G Glycine
3 16 0 C Cysteine
17 8 11 I Isoleucine
4 11 1 F Phenylalanine

1TLF_1|Chains A, B, C, D|LAC REPRESSOR|Escherichia coli (562)
>3VJQ_1|Chain A|Thaumatin I|Thaumatococcus daniellii (4621)
>8GAD_1|Chains A, B|PD-L1 binder|synthetic construct (32630)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1TLF , Knot 127 301 0.80 40 174 288
QSLLIGVATSSLALHAPSQIVAAIKSRADQLGASVVVSMVERSGVEACKAAVHNLLAQRVSGLIINYPLDDQDAIAVEAACTNVPALFLDVSDQTPINSIIFSHEDGTRLGVEHLVALGHQQIALLAGPLSSVSARLRLAGWHKYLTRNQIQPIAEREGDWSAMSGFQQTMQMLNEGIVPTAMLVANDQMALGAMRAITESGLRVGADISVVGYDDTEDSSCYIPPLTTIKQDFRLLGQTSVDRLLQLSQGQAVKGNQLLPVSLVKRKTTLAPNTQTASPRALADSLMQLARQVSRLESGQ
3VJQ , Knot 99 207 0.85 38 150 203
ATFEIVNRCSYTVWAAASKGDAALDAGGRQLNSGESWTINVEPGTNGGKIWARTDCYFDDSGSGICKTGDCGGLLRCKRFGRPPTTLAEFSLNQYGKDYIDISNIKGFNVPMDFSPTTRGCRGVRCAADIVGQCPAKLKAPGGGCNDACTVFQTSEYCCTTGKCGPTEYSRFFKRLCPDAFSYVLDKPTTVTCPGSSNYRVTFCPTA
8GAD , Knot 51 105 0.75 34 73 98
SMEEEIEEAYDLVEEAEKTGDTSLLKKAKELLDKVAEEATKSGNPILLIRVIIILIKIVRNSGDPSVAALARELLEKLEEIAEKEGNRFIEAMGEALRTQIERAL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1TLF_1)}(2) \setminus P_{f(3VJQ_1)}(2)|=102\), \(|P_{f(3VJQ_1)}(2) \setminus P_{f(1TLF_1)}(2)|=78\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0011111100011101100111110001001110111011000110100111001110010111100110000111101100011111101000011001110000100111001111100011111111001010101111000100001011100010101101100010110011110111110001111110110001101110101110000000000111100100010111000100110100101101001111011000001110000101011100110110010010010
Pair \(Z_2\) Length of longest common subsequence
1TLF_1,3VJQ_1 180 4
1TLF_1,8GAD_1 161 3
3VJQ_1,8GAD_1 159 4

Newick tree

 
[
	1TLF_1:87.25,
	[
		8GAD_1:79.5,3VJQ_1:79.5
	]:7.75
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{508 }{\log_{20} 508}-\frac{207}{\log_{20}207})=87.0\)
Status Protein1 Protein2 d d1/2
Query variables 1TLF_1 3VJQ_1 109 92
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]