CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7AEX_1 5DQZ_1 5HQM_1 Letter Amino acid
22 26 24 V Valine
12 23 25 R Arginine
11 18 32 D Aspartic acid
12 14 22 K Lycine
10 4 17 M Methionine
13 25 50 G Glycine
17 22 23 I Isoleucine
12 22 23 P Proline
17 13 26 T Threonine
22 35 59 A Alanine
7 4 20 N Asparagine
3 4 3 C Cysteine
14 10 15 Q Glutamine
10 9 17 Y Tyrosine
16 15 23 S Serine
2 4 5 W Tryptophan
18 15 22 E Glutamic acid
15 4 19 H Histidine
31 29 32 L Leucine
11 9 30 F Phenylalanine

7AEX_1|Chain A|mRNA endoribonuclease toxin LS|Escherichia coli (strain K12) (83333)
>5DQZ_1|Chains A, B, C, D|CRISPR-associated endonuclease Cas1|Escherichia coli K12 (83333)
>5HQM_1|Chains A, B|Ribulose bisphosphate carboxylase (R. palustris/R. rubrum chimera),Ribulose bisphosphate carboxylase|Rhodopseudomonas palustris (1076)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7AEX , Knot 123 275 0.83 40 183 262
MHHHHHHSEINPAEFEQVNMVLQGFVETSVLPVLELSADESHIEFREHSRNAHTVVWKIISTSYQDELTVSLHITTGKLQIQGRPLSCYRVFTFNLAALLDLQGLEKVLIRQEDGKANIVQQEVARTYLQTVMADAYPHLHVTAEKLLVSGLCVKLAAPDLPDYCMLLYPELRTIEGVLKSKMSGLGMPVQQPAGFGTYFDKPAAHYILKPQFAATLRPEQINIISTAYTFFNVERHSLFHMETVVDASRMISDMARLMGKATRAWGIIKDLYIV
5DQZ , Knot 131 305 0.82 40 178 285
MTWLPLNPIPLKDRVSMIFLQYGQIDVIDGAFVLIDKTGIRTHIPVGSVACIMLEPGTRVSHAAVRLAAQVGTLLVWVGEAGVRVYASGQPGGARSDKLLYQAKLALDEDLRLKVVRKMFELRFGEPAPARRSVEQLRGIEGSRVRATYALLAKQYGVTWNGRRYDPKDWEKGDTINQCISAATSCLYGVTEAAILAAGYAPAIGFVHTGKPLSFVYDIADIIKFDTVVPKAFEIARRNPGEPDREVRLACRDIFRSSKTLAKLIPLIEDVLAAGEIQPPAPPEDAQPVAIPLPVSLGDAGHRSS
5HQM , Knot 197 487 0.83 40 248 459
MGSSHHHHHHSSGLVPRGSHMDQSNRYANLNLKESELIAGGRHVLCAYIMKPKAGFGNFIQTAAHFAAESSTGTNVEVSTTDDFTRGVDALVYEVDEANSLMKIAYPIELFDRNVIDGRAMIASFLTLTIGNNQGMGDVEYAKMYDFYVPPAYLKLFDGPSTTIKDLWRVLGRPVINGGFIVGTIIKPKLGLRPQPFANACYDFWLGGDFIKNDEPQGNQVFAPFKDTVRAVADAMRRAQDKTGEAKLFSFNITADDHYEMLARGEFILETFADNADHIAFLVDGYVAGPAAVTTARRAFPKQYLHYHRAGHGAVTSPQSKRGYTAFVLSKMARLQGASGIHTGTMGFGKMEGEAADRAIAYMITEDAADGPYFHQEWLGMNPTTPIISGGMNALRMPGFFDNLGHSNLIMTAGGGAFGHVDGGAAGAKSLRQAEQCWKQGADPVEFAKDHREFARAFESFPQDADKLYPNWRAKLGVEDTRSALPA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7AEX_1)}(2) \setminus P_{f(5DQZ_1)}(2)|=82\), \(|P_{f(5DQZ_1)}(2) \setminus P_{f(7AEX_1)}(2)|=77\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10000000010110100101110111000111110101000010100000010011101100000001010101001010101011000011010111110101100111000010101100011000100111010101010100111011010111101100011101010010111000101111110011111001001110011010111010100101100100110100001101001101001100110111010011111001011
Pair \(Z_2\) Length of longest common subsequence
7AEX_1,5DQZ_1 159 3
7AEX_1,5HQM_1 153 7
5DQZ_1,5HQM_1 160 4

Newick tree

 
[
	5DQZ_1:80.80,
	[
		7AEX_1:76.5,5HQM_1:76.5
	]:4.30
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{580 }{\log_{20} 580}-\frac{275}{\log_{20}275})=85.9\)
Status Protein1 Protein2 d d1/2
Query variables 7AEX_1 5DQZ_1 105 102
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]