CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8PNU_1 6OUH_1 5TQB_1 Letter Amino acid
3 7 4 W Tryptophan
11 17 26 V Valine
6 10 5 N Asparagine
4 19 8 D Aspartic acid
8 1 6 M Methionine
15 12 9 F Phenylalanine
20 26 17 L Leucine
13 17 15 S Serine
4 8 4 Y Tyrosine
20 13 37 A Alanine
5 13 9 E Glutamic acid
11 11 9 H Histidine
11 17 17 P Proline
11 9 10 I Isoleucine
4 24 25 K Lycine
4 12 14 T Threonine
6 7 21 R Arginine
1 1 2 C Cysteine
3 11 9 Q Glutamine
23 22 30 G Glycine

8PNU_1|Chains A, B, C, G, H, I|Styrene oxide isomerase|Pseudomonas sp. VLB120 (69328)
>6OUH_1|Chain A|Carbonic anhydrase 2|Homo sapiens (9606)
>5TQB_1|Chain A|60S ribosomal protein L4-like protein|Chaetomium thermophilum (759272)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8PNU , Knot 85 183 0.80 40 127 173
MGSSHHHHHHSQDPMLHAFERKMAGHGILMIFCTLLFGVGLWMNLVGGFEIIPGYIIEFHVPGSPEGWARAHSGPALNGMMVIAVAFVLPSLGFADKTARLLGSIIVLDGWSNVGFYLFSNFSPNRGLTFGPNQFGPGDIFSFLALAPAYLFGVLAMGALAVIGYQALKSTRSRKAVPHAAAE
6OUH , Knot 110 257 0.79 40 173 246
HWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRILNNGHAFNVEFDDSQDKAVLKGGPLDGTYRLIQFHFHWGSLDGQGSEHTVDKKKYAAELHLVHWNTKYGDFGKAVQQPDGLAVLGIFLKVGSAKPGLQKVVDVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSLTTPPLLECVTWIVLKEPISVSSEQVLKFRKLNFNGEGEPEELMVDNWRPAQPLKNRQIKASFK
5TQB , Knot 119 277 0.80 40 164 263
MASRPTVTVFGADGKPTGATEVLPKVFSAPIRPDIVKHVHTGMAKNKRQPYAVSEKAGHQTSAESWGTGRAVARIPRVSGGGTHRAGQGAFGNMCRSGRMFAPTKIWRKWHVKINQGQKRFATASALAASAVAPLLMARGHQVSTVPEVPLVVDSAAVAGDAVAKTAAAYKLLKAIGAGPDVEKVKKSKKLRAGKGKMRGRRHRQRRGPLIVYSPEHDGKELVKGFRNIPGVETCPVDALNLLQLAPGGHLGRFIVWTSAAIKQLDAVYESKKGFFL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8PNU_1)}(2) \setminus P_{f(6OUH_1)}(2)|=61\), \(|P_{f(6OUH_1)}(2) \setminus P_{f(8PNU_1)}(2)|=107\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110000000000011101100011101111110011111111101111101111011010111010111010011110111111111111011110001011101111011001110110010100110111001111011011111110111111111111110011000000011101110
Pair \(Z_2\) Length of longest common subsequence
8PNU_1,6OUH_1 168 3
8PNU_1,5TQB_1 175 3
6OUH_1,5TQB_1 161 3

Newick tree

 
[
	8PNU_1:87.45,
	[
		6OUH_1:80.5,5TQB_1:80.5
	]:6.95
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{440 }{\log_{20} 440}-\frac{183}{\log_{20}183})=75.6\)
Status Protein1 Protein2 d d1/2
Query variables 8PNU_1 6OUH_1 96 81
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]