CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5TUP_1 1NTM_1 6PIP_1 Letter Amino acid
10 22 0 Q Glutamine
8 14 1 F Phenylalanine
0 5 1 W Tryptophan
4 15 0 H Histidine
32 45 0 L Leucine
13 14 2 K Lycine
6 8 0 M Methionine
24 33 0 S Serine
10 17 0 N Asparagine
21 30 0 E Glutamic acid
12 32 2 G Glycine
6 17 2 Y Tyrosine
15 17 1 I Isoleucine
8 18 0 P Proline
23 44 0 A Alanine
10 25 4 R Arginine
19 24 0 D Aspartic acid
7 12 4 C Cysteine
11 24 0 T Threonine
20 30 1 V Valine

5TUP_1|Chains A, B, C|Proliferating cell nuclear antigen|Aspergillus fumigatus Z5 (1437362)
>1NTM_1|Chain A|Ubiquinol-cytochrome C reductase complex core protein I, mitochondrial|Bos taurus (9913)
>6PIP_1|Chain A|Tachyplesin-3|Tachypleus gigas (6852)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5TUP , Knot 116 259 0.83 38 166 246
GMLEARLEQASLLKRVVDAIKDLVQDCNFDCNDSGIALQAMDNSHVALVSMLLKAEGFSPYRCDRNIALGINLVSLTKVLRAAQNEDILTLKADDSPDAVNLMFESAETDRISEYDIKLMDIDQEHLAIPETEYAATVEMPSAEFQRICRDLNALSESVVIEATKEGVKFSCQGDIGSGSVTIRQHTSVDKPEQNVSIALSEPVALTFSLKYLVNFCKATSLSSKVTLCLSQEVPLLVEYGLGSGHLRFYLAPKIGDEE
1NTM , Knot 185 446 0.84 40 239 424
TATYAQALQSVPETQVSQLDNGLRVASEQSSQPTCTVGVWIDAGSRYESEKNNGAGYFVEHLAFKGTKNRPGNALEKEVESMGAHLNAYSTREHTAYYIKALSKDLPKAVELLADIVQNCSLEDSQIEKERDVILQELQENDTSMRDVVFNYLHATAFQGTPLAQSVEGPSENVRKLSRADLTEYLSRHYKAPRMVLAAAGGLEHRQLLDLAQKHFSGLSGTYDEDAVPTLSPCRFTGSQICHREDGLPLAHVAIAVEGPGWAHPDNVALQVANAIIGHYDCTYGGGAHLSSPLASIAATNKLCQSFQTFNICYADTGLLGAHFVCDHMSIDDMMFVLQGQWMRLCTSATESEVLRGKNLLRNALVSHLDGTTPVCEDIGRSLLTYGRRIPLAEWESRIAEVDARVVREVCSKYFYDQCPAVAGFGPIEQLPDYNRIRSGMFWLRF
6PIP , Knot 12 18 0.64 18 14 15
KWCFRVCYRGICYRKCRG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5TUP_1)}(2) \setminus P_{f(1NTM_1)}(2)|=43\), \(|P_{f(1NTM_1)}(2) \setminus P_{f(5TUP_1)}(2)|=116\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1110101001011001101100110000100000111101100001111011101011010000001111101101001101100001101010001011011100100001000010110100001111000011010110101001000101100011101000110100010110101010000010010001011100111101010011010010010001010100011111001110101010111011000
Pair \(Z_2\) Length of longest common subsequence
5TUP_1,1NTM_1 159 4
5TUP_1,6PIP_1 170 2
1NTM_1,6PIP_1 239 3

Newick tree

 
[
	6PIP_1:11.58,
	[
		5TUP_1:79.5,1NTM_1:79.5
	]:31.08
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{705 }{\log_{20} 705}-\frac{259}{\log_{20}259})=124.\)
Status Protein1 Protein2 d d1/2
Query variables 5TUP_1 1NTM_1 161 125
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]