CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8ERM_1 6WKR_1 6THY_1 Letter Amino acid
2 3 9 P Proline
2 1 29 Y Tyrosine
12 2 51 N Asparagine
28 5 25 G Glycine
0 1 8 H Histidine
10 7 47 I Isoleucine
38 2 11 A Alanine
10 7 38 K Lycine
0 0 8 W Tryptophan
1 1 12 M Methionine
27 7 14 T Threonine
25 4 27 V Valine
9 5 28 D Aspartic acid
0 1 4 C Cysteine
10 6 13 Q Glutamine
3 6 15 E Glutamic acid
2 4 17 R Arginine
9 9 31 L Leucine
5 2 15 F Phenylalanine
25 3 31 S Serine

8ERM_1|Chains A, B|B-type flagellin|Pseudomonas aeruginosa PAO1 (208964)
>6WKR_1|Chains A[auth F], Q[auth T]|Ubiquitin|Homo sapiens (9606)
>6THY_1|Chain A[auth AAA]|BoNT/A3|Clostridium botulinum (1491)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8ERM , Knot 88 218 0.72 34 110 194
GAGTVASVAGTATASGIASGTVNLVGGGQVKNIAIAAGDSAKAIAEKMDGAIPNLSARARTVFTADVSGVTGGSLNFDVTVGSNTVSLAGVTSTQDLADQLNSNSSKLGITASINDKGVLTITSATGENVKFGAQTGTATAGQVAVKVQGSDGKFEAAAKNVVAAGTAATTTIVTGYVQLNSPTAYSVSGTGTQASQVFGNASAAQKSSVASVDISTA
6WKR , Knot 40 76 0.76 38 61 74
MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGC
6THY , Knot 172 433 0.80 40 222 401
MHHHHHHKNIVNTSILSIVYKKDDLIDLSRYGAKINIGDRVYYDSIDKNQIKLINLESSTIEVILKNAIVYNSMYENFSTSFWIKIPKYFSKINLNNEYTIINCIENNSGWKVSLNYGEIIWTLQDNKQNIQRVVFKYSQMVNISDYINRWMFVTITNNRLTKSKIYINGRLIDQKPISNLGNIHASNKIMFKLDGCRDPRRYIMIKYFNLFDKELNEKEIKDLYDSQSNPGILKDFWGNYLQYDKPYYMLNLFDPNKYVDVNNIGIRGYMYLKGPRGSVMTTNIYLNSTLYMGTKFIIKKYASGNEDNIVRNNDRVYINVVVKNKEYRLATNASQAGVEKILSALEIPDVGNLSQVVVMKSKDDQGIRNKCKMNLQDNNGNDIGFVGFHLYDNIAKLVASNWYNRQVGKASRTFGCSWEFIPVDDGWGESSL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8ERM_1)}(2) \setminus P_{f(6WKR_1)}(2)|=84\), \(|P_{f(6WKR_1)}(2) \setminus P_{f(8ERM_1)}(2)|=35\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11101101110101011101010111110100111111001011100101111010101001101010110110101010110001011110000011001000000111010100011101001010010111001010110111010100101011100111110110001101010100101001010100100111010110000110101001
Pair \(Z_2\) Length of longest common subsequence
8ERM_1,6WKR_1 119 4
8ERM_1,6THY_1 160 3
6WKR_1,6THY_1 191 3

Newick tree

 
[
	6THY_1:95.74,
	[
		8ERM_1:59.5,6WKR_1:59.5
	]:36.24
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{294 }{\log_{20} 294}-\frac{76}{\log_{20}76})=69.6\)
Status Protein1 Protein2 d d1/2
Query variables 8ERM_1 6WKR_1 81 55
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]