CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5XPD_1 6GXG_1 7SMD_1 Letter Amino acid
21 14 11 G Glycine
10 3 12 H Histidine
8 2 11 M Methionine
14 5 14 Y Tyrosine
19 11 18 A Alanine
13 5 17 N Asparagine
12 9 15 P Proline
28 10 26 V Valine
6 4 22 R Arginine
32 13 42 L Leucine
16 7 22 K Lycine
13 12 27 S Serine
17 10 15 T Threonine
4 4 4 W Tryptophan
10 7 16 D Aspartic acid
6 2 18 Q Glutamine
11 9 28 E Glutamic acid
21 7 25 I Isoleucine
24 9 19 F Phenylalanine
8 2 9 C Cysteine

5XPD_1|Chain A|sugar transporter|Arabidopsis thaliana (3702)
>6GXG_1|Chain A|Tryparedoxin|Trypanosoma brucei brucei (5702)
>7SMD_1|Chain A|Retinoblastoma-like protein 1|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5XPD , Knot 124 293 0.80 40 176 272
MALTNNLWAFVFGILGNIISFVLFLAPVPTFVRICKKKSTEGFQSLPYVSALFNAMLWIYYAMQKDGTAFLLITINAFGCVIETIYIVLFVSYANKKTRISTLKVLGLLNFLGFAAIVLVCELLTKGSTREKVLGGICVGFSVSMFAAPLSIMRVVVRTRSVEFMPFSLSLFLTINAVTWLFYGLAIKDFYVALPNVLGAFLGAVQMILYIIFKYYKTPVAQMKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIPDDWVCPLCGVGKDQFEEVEEENLYFQGHHHHHHHHHH
6GXG , Knot 71 145 0.81 40 121 142
GAMGSGLAKYLPGATNLLSKSGEVSLGSLVGKTVFLYFSASWCPPCRGFTPVLAEFYEKHHVAKNFEVVLISWDENESDFHDYYGKMPWLALPFDQRSTVSELGKTFGVESIPTLITINADTGAIIGTQARTRVIEDPDGANFPW
7SMD , Knot 160 371 0.85 40 220 355
GEFTQSVSRLQSIVAGLKNAPSDQLINIFESCVRNPVENIMKILKGIGETFCQHYTQSTDEQPGSHIDFAVNRLKLAEILYYKILETVMVQETRRLHGMDMSVLLEQDIFHRSLMACCLEIVLFAYSSPRTFPWIIEVLNLQPFYFYKVIEVVIRSEEGLSRDMVKHLNSIEEQILESLAWSHDSALWEALQVSANKVPTCEEVIFPNNFETGNNRPKRTGSLALFYRKVYHLASVRLRDLCLKLDVSNELRRKIWTCFEFTLVHCPDLMKDRHLDQLLLCAFYIMAKVTKEERTFQEIMKSYRNQPQANSHVYRSVLLKSIKEERGDLIKFYNTIYVGRVKSFALKYDLANQDHMMDAPPLSPFPHIKQQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5XPD_1)}(2) \setminus P_{f(6GXG_1)}(2)|=96\), \(|P_{f(6GXG_1)}(2) \setminus P_{f(5XPD_1)}(2)|=41\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11100011111111110110111111111011010000000110011010111011111001100010111110101110110010111110010000010010111110111111111100110010000011111011101011111101101110000101111010111010110111011110010111101111111110111011100000111010000001010100100101001101100100110011011011100010010000101010000000000
Pair \(Z_2\) Length of longest common subsequence
5XPD_1,6GXG_1 137 6
5XPD_1,7SMD_1 182 4
6GXG_1,7SMD_1 189 3

Newick tree

 
[
	7SMD_1:99.54,
	[
		5XPD_1:68.5,6GXG_1:68.5
	]:31.04
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{438 }{\log_{20} 438}-\frac{145}{\log_{20}145})=87.3\)
Status Protein1 Protein2 d d1/2
Query variables 5XPD_1 6GXG_1 101 76.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]