CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6WBK_1 3MCO_1 5LJD_1 Letter Amino acid
4 17 2 H Histidine
27 17 13 V Valine
23 26 3 A Alanine
14 27 5 N Asparagine
28 52 7 I Isoleucine
5 5 6 M Methionine
16 12 2 P Proline
12 15 7 T Threonine
5 5 4 W Tryptophan
10 2 3 C Cysteine
24 41 11 K Lycine
17 11 3 Y Tyrosine
34 29 2 S Serine
9 20 7 R Arginine
15 29 12 D Aspartic acid
14 16 7 Q Glutamine
19 31 13 E Glutamic acid
13 20 10 G Glycine
58 56 12 L Leucine
23 11 6 F Phenylalanine

6WBK_1|Chains A, B, C, D, E, F, G|Pannexin-1|Homo sapiens (9606)
>3MCO_1|Chains A, B|2-amino-4-hydroxy-6-hydroxymethyldihydropteridine pyrophosphokinase/dihydropteroate synthase|Francisella tularensis subsp. holarctica (376619)
>5LJD_1|Chain A|Retinol-binding protein 1|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6WBK , Knot 150 370 0.80 40 199 347
MAIAQLATEYVFSDFLLKEPTEPKFKGLRLELAVDKMVTCIAVGLPLLLISLAFAQEISIGTQISCFSPSSFSWRQAAFVDSYCWAAVQQKNSLQSESGNLPLWLHKFFPYILLLFAILLYLPPLFWRFAAAPHICSDLKFIMEELDKVYNRAIKAAKSARDLDMRDGACSVPGVTENLGQSLWEVSESHFKYPIVEQYLKTKKNSNNLIIKYISCRLLTLIIILLACIYLGYYFSLSSLSDEFVCSIKSGILRNDSTVPDQFQCKLIAVGIFQLLSVINLVVYVLLAPVVVYTLFVPFRQKTDVLKVYEILPTFDVLHFKSEGYNDLSLYNLFLEENISEVKSYKCLKVLENIKSSGQGIDPMLLLTNL
3MCO , Knot 176 442 0.80 40 205 404
MGSSHHHHHHSSGLVPRGSHMVQYIIGIGTNSGFTIENIHLAITALESQQNIRIIRKASLYSSKAVLKEDAPKEWDIRFLNTAVKISSSLKPDELLVLLKDIELKIGRDLNAPAWSPRVIDLDILAAEDLILETDKLTIPHKELINRSFALAPLLELSKGWHHPKYVEWDLNIRLKELGEIVKLKQTLANTIRMGIVNLSNQSFSDGNFDDNQRKLNLDELIQSGAEIIDIGAESTKPDAKPISIEEEFNKLNEFLEYFKSQLANLIYKPLVSIDTRKLEVMQKILAKHHDIIWMINDVECNNIEQKAQLIAKYNKKYVIIHNLGITDRNQYLDKENAIDNVCDYIEQKKQILLKHGIAQQNIYFDIGFGFGKKSDTARYLLENIIEIKRRLELKALVGHSRKPSVLGLTKDSNLATLDRATRELSRKLEKLDIDIIRVHKI
5LJD , Knot 69 135 0.83 40 115 131
MPVDFTGYWKMLVNENFEEYLRALDVNVALRKIANLLKPDLEIVQDGDHMIIRTLSTFRNYIMDFQVGKEFEEDLTGIDDRKCMTTVSWDGDKLQCVQKGEKEGRGWTQWIEGDELHLEMRVEGVVCKQVFKKVQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6WBK_1)}(2) \setminus P_{f(3MCO_1)}(2)|=83\), \(|P_{f(3MCO_1)}(2) \setminus P_{f(6WBK_1)}(2)|=89\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1111011000110011100100101011010111001100111111111101111001011001001010010100111100001111000001000010111110011101111111110111111011111010001011100100100011011001001010011001111000110011010000100111000100000000111001000110111111101011001010010001100100111000001100100011111110110110111011111111001111100000110100111010110100010001010011100010010000010110010001011011111001
Pair \(Z_2\) Length of longest common subsequence
6WBK_1,3MCO_1 172 4
6WBK_1,5LJD_1 180 3
3MCO_1,5LJD_1 182 4

Newick tree

 
[
	5LJD_1:91.95,
	[
		6WBK_1:86,3MCO_1:86
	]:5.95
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{812 }{\log_{20} 812}-\frac{370}{\log_{20}370})=119.\)
Status Protein1 Protein2 d d1/2
Query variables 6WBK_1 3MCO_1 147 138
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]