CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5ULL_1 8EJD_1 5EZT_1 Letter Amino acid
19 10 11 E Glutamic acid
3 13 8 Y Tyrosine
3 5 19 P Proline
8 21 15 S Serine
3 10 0 C Cysteine
2 10 12 Q Glutamine
14 19 20 G Glycine
10 7 18 K Lycine
5 11 3 M Methionine
5 10 11 F Phenylalanine
10 14 20 V Valine
8 19 13 N Asparagine
8 30 26 L Leucine
3 3 7 W Tryptophan
6 12 17 A Alanine
2 9 9 R Arginine
9 8 19 D Aspartic acid
0 10 10 H Histidine
15 18 5 I Isoleucine
5 20 14 T Threonine

5ULL_1|Chain A|FLAVODOXIN|Clostridium beijerinckii (1520)
>8EJD_1|Chains A, C[auth B], E[auth C]|Glycoprotein G1|Lassa mammarenavirus (3052310)
>5EZT_1|Chain A[auth X]|Carbonic anhydrase 2|Bos taurus (9913)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5ULL , Knot 68 138 0.81 38 101 133
MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTINVSDVNIDELLNEDILILGCSAMGDEVLEESEFEPFIEEISTKISGKKVALFGSYGWGDGKWMRDFEERMNGYGCVVVETPLIVQNEPDEAEQDCIEFGKKIANI
8EJD , Knot 116 259 0.83 40 170 247
MGQIVTFFQEVPHVIEEVMNIVLIALSVLAVLKGLYNFATCGLVGLVTFLLLCGRSCTTSLYKGVYELQTLELNMETLNMTMPLSCTKNNSHHYIMVGNETGLELTLTNTSIINHKFCNLSDAHKKNLYDHALMSIISTFHLSIPNFNQYEAMSCDFNGGKISVQYNLSHSYAGDAANHCGTVANGVLQTFMRMAWGGSYIALDSGCGNWDCIMTSYQYLIIQNTTWEDHCQFSRPSPIGYLGLLSQRTRDIYISRRRR
5EZT , Knot 115 257 0.82 38 167 246
HWGYGKHNGPEHWHKDFPIANGERQSPVDIDTKAVVQDPALKPLALVYGEATSRRMVNNGHSFNVEYDDSQDKAVLKDGPLTGTYRLVQFHFHWGSSDDQGSEHTVDRKKYAAELHLVHWNTKYGDFGTAAQQPDGLAVVGVFLKVGDANPALQKVLDALDSIKTKGKSTDFPNFDPGSLLPNVLDYWTYPGSLTTPPLLESVTWIVLKEPISVSSQQMLKFRTLNFNAEGEPELLMLANWRPAQPLKNRQVRGFPK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5ULL_1)}(2) \setminus P_{f(8EJD_1)}(2)|=44\), \(|P_{f(8EJD_1)}(2) \setminus P_{f(5ULL_1)}(2)|=113\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101101010100001101110111001001001010010100110001111100111001100001011100100010100111110011101011001000101010111001111000100100001011001101
Pair \(Z_2\) Length of longest common subsequence
5ULL_1,8EJD_1 157 3
5ULL_1,5EZT_1 166 3
8EJD_1,5EZT_1 181 4

Newick tree

 
[
	5EZT_1:89.43,
	[
		5ULL_1:78.5,8EJD_1:78.5
	]:10.93
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{397 }{\log_{20} 397}-\frac{138}{\log_{20}138})=78.0\)
Status Protein1 Protein2 d d1/2
Query variables 5ULL_1 8EJD_1 101 75.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]