CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1NTM_1 2MUA_1 1FJM_1 Letter Amino acid
30 1 21 E Glutamic acid
17 3 20 I Isoleucine
33 1 18 S Serine
30 0 13 V Valine
25 1 18 R Arginine
12 0 13 C Cysteine
15 0 6 H Histidine
8 2 6 M Methionine
18 0 17 P Proline
17 1 13 Y Tyrosine
17 0 15 N Asparagine
22 1 13 Q Glutamine
14 2 22 K Lycine
24 1 9 T Threonine
5 1 3 W Tryptophan
44 0 15 A Alanine
24 2 23 D Aspartic acid
32 3 27 G Glycine
45 2 39 L Leucine
14 0 19 F Phenylalanine

1NTM_1|Chain A|Ubiquinol-cytochrome C reductase complex core protein I, mitochondrial|Bos taurus (9913)
>2MUA_1|Chain A|Ring-infected erythrocyte surface antigen|Plasmodium falciparum (5837)
>1FJM_1|Chains A, B|PROTEIN SERINE/THREONINE PHOSPHATASE-1 (ALPHA ISOFORM, TYPE 1)|Oryctolagus cuniculus (9986)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1NTM , Knot 185 446 0.84 40 239 424
TATYAQALQSVPETQVSQLDNGLRVASEQSSQPTCTVGVWIDAGSRYESEKNNGAGYFVEHLAFKGTKNRPGNALEKEVESMGAHLNAYSTREHTAYYIKALSKDLPKAVELLADIVQNCSLEDSQIEKERDVILQELQENDTSMRDVVFNYLHATAFQGTPLAQSVEGPSENVRKLSRADLTEYLSRHYKAPRMVLAAAGGLEHRQLLDLAQKHFSGLSGTYDEDAVPTLSPCRFTGSQICHREDGLPLAHVAIAVEGPGWAHPDNVALQVANAIIGHYDCTYGGGAHLSSPLASIAATNKLCQSFQTFNICYADTGLLGAHFVCDHMSIDDMMFVLQGQWMRLCTSATESEVLRGKNLLRNALVSHLDGTTPVCEDIGRSLLTYGRRIPLAEWESRIAEVDARVVREVCSKYFYDQCPAVAGFGPIEQLPDYNRIRSGMFWLRF
2MUA , Knot 15 21 0.72 26 20 19
YLGRSGGDIIKKMQTLWDEIM
1FJM , Knot 148 330 0.86 40 214 322
MSDSEKLNLDSIIGRLLEVQGSRPGKNVQLTENEIRGLCLKSREIFLSQPILLELEAPLKICGDIHGQYYDLLRLFEYGGFPPESNYLFLGDYVDRGKQSLETICLLLAYKIKYPENFFLLRGNHECASINRIYGFYDECKRRYNIKLWKTFTDCFNCLPIAAIVDEKIFCCHGGLSPDLQSMEQIRRIMRPTDVPDQGLLCDLLWSDPDKDVQGWGENDRGVSFTFGAEVVAKFLHKHDLDLICRAHQVVEDGYEFFAKRQLVTLFSAPNYCGEFDNAGAMMSVDETLMCSFQILKPADKNKGKYGQFSGLNPGGRPITPPRNSAKAKK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1NTM_1)}(2) \setminus P_{f(2MUA_1)}(2)|=227\), \(|P_{f(2MUA_1)}(2) \setminus P_{f(1NTM_1)}(2)|=8\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:01001011001100010010011011000000100011111011000000000111011001110100001101100010011101010000000100101100011011011101100001000010000011100100000010011100101011010111001011000100100101000100000110111111111000011011000101101000001110101001010010000011111011111011111010011101101111000000111101001110111000100010010100100111110110001010011111010110100010000110100110011100101001100011001100100111101000110101011001000010000111111111001100001001111101
Pair \(Z_2\) Length of longest common subsequence
1NTM_1,2MUA_1 235 3
1NTM_1,1FJM_1 165 4
2MUA_1,1FJM_1 204 3

Newick tree

 
[
	2MUA_1:11.77,
	[
		1NTM_1:82.5,1FJM_1:82.5
	]:35.27
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{467 }{\log_{20} 467}-\frac{21}{\log_{20}21})=140.\)
Status Protein1 Protein2 d d1/2
Query variables 1NTM_1 2MUA_1 181 94.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]