CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5SUE_1 8BVG_1 4POE_1 Letter Amino acid
11 6 12 R Arginine
14 13 14 N Asparagine
15 16 20 E Glutamic acid
5 9 11 H Histidine
10 11 10 F Phenylalanine
18 10 23 S Serine
7 11 10 Y Tyrosine
9 8 12 Q Glutamine
7 23 15 G Glycine
23 12 18 I Isoleucine
8 6 4 M Methionine
21 15 26 T Threonine
18 8 19 A Alanine
1 2 3 C Cysteine
29 21 20 L Leucine
10 10 9 P Proline
14 18 26 V Valine
14 18 17 D Aspartic acid
19 20 25 K Lycine
5 1 7 W Tryptophan

5SUE_1|Chain A|Pre-mRNA-splicing factor 8|Saccharomyces cerevisiae S288C (559292)
>8BVG_1|Chains A, B, C, D, E, F|BrUSSLEE|Aequorea victoria (6100)
>4POE_1|Chain A|Uricase|Aspergillus flavus (5059)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5SUE , Knot 117 258 0.84 40 175 251
GAMNSSNYAELFNNDIKLFVDDTNVYRVTVHKTFEGNVATKAINGCIFTLNPKTGHLFLKIIHTSVWAGQKRLSQLAKWKTAEEVSALVRSLPKEEQPKQIIVTRKAMLDPLEVHMLDFPNIAIRPTELRLPFSAAMSIDKLSDVVMKATEPQMVLFNIYDDWLDRISSYTAFSRLTLLLRALKTNEESAKMILLSDPTITIKSYHLWPSFTDEQWITIESQMRDLILTEYGRKYNVNISALTQTEIKDIILGQNIKA
8BVG , Knot 110 238 0.84 40 158 224
VSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLGYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNMNSHNVYIMADKQKNGIKVNYKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYK
4POE , Knot 136 302 0.85 42 204 291
XSAVKAARYGKDNVRVYKVHKDEKTGVQTVYEMTVCVLLEGEIETSYTKADNSVIVATDSIKNTIYITAKQNPVTPPELFGSILGTHFIEKYNHIHAAHVNIVCHRWTRMDIDGKPHPHSFIRDSEEKRNVQVDVVEGKGIDIKSSLSGLTVLKSTNSQFWGFLRDEYTTLKETWDRILSTDVDATWQWKNFSGLQEVRSHVPKFDATWATAREVTLKTFAEDNSASVQATMYKMAEQILARQQLIETVEYSLPNKHYFEIDLSWHKGLQNTGKNAEVFAPQSDPNGLIKCTVGRSSLKSKL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5SUE_1)}(2) \setminus P_{f(8BVG_1)}(2)|=88\), \(|P_{f(8BVG_1)}(2) \setminus P_{f(5SUE_1)}(2)|=71\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111000001011000101110000100101000101011001101011010100101110110001111000100110100100101110011000010011100011101101011011011101001011101110100100111010010111101000110010000110010111011000000101111001010100001110100001101000100111000100001010110000100111100101
Pair \(Z_2\) Length of longest common subsequence
5SUE_1,8BVG_1 159 3
5SUE_1,4POE_1 173 3
8BVG_1,4POE_1 164 4

Newick tree

 
[
	4POE_1:85.81,
	[
		5SUE_1:79.5,8BVG_1:79.5
	]:6.31
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{496 }{\log_{20} 496}-\frac{238}{\log_{20}238})=74.1\)
Status Protein1 Protein2 d d1/2
Query variables 5SUE_1 8BVG_1 90 87
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]