CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6JBW_1 3MXU_1 6FJQ_1 Letter Amino acid
8 4 3 M Methionine
31 15 9 A Alanine
34 11 10 G Glycine
3 0 3 C Cysteine
25 4 6 I Isoleucine
26 2 11 F Phenylalanine
32 7 17 S Serine
20 10 20 T Threonine
24 1 1 R Arginine
16 3 17 N Asparagine
22 4 12 P Proline
42 13 12 V Valine
19 10 6 Q Glutamine
30 15 8 E Glutamic acid
48 14 15 L Leucine
20 7 20 K Lycine
9 3 3 W Tryptophan
17 4 10 Y Tyrosine
23 9 12 D Aspartic acid
16 7 0 H Histidine

6JBW_1|Chains A, B|Trehalose-6-phosphate synthase|Pyricularia oryzae 70-15 (242507)
>3MXU_1|Chain A|Glycine cleavage system H protein|Bartonella henselae (38323)
>6FJQ_1|Chains A, B|Fiber|Human adenovirus 48 (39641)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6JBW , Knot 194 465 0.85 40 250 446
RLLLISNRLPITIKRSDDGQYSFSMSSGGLVTGLSGLAKTTSFQWYGWPGLEVPDAEAGPVVQRLKNEYGAHPVFVDDELADRHYNGFANSILWPLFHYHPGEITFDESAWSAYKEVNRLFAQTVVKDVQDGDMIWVHDYHLMLLPEMLREEIGDSKKNVKIGFFLHTPFPSSEIYRILPVRQALLQGVLHCDLLGFHTYDYARHFLSSCSRILSAPTTPNGVQFAGRFVTVGAFPIGIDPEKFVEGLQKPKVQQRIAALTRKFEGVKLIVGVDRLDYIKGVPQKLHALEVFLTEHPEWIGKIVLVQVAVPSRQDVEEYQNLRAVVNELVGRINGKFGTIEFMPIHFLHQSVSFDELAALYAVSDVCLVSSTRDGMNLVSYEYIATQRDRHGVMILSEFTGAAQSLSGSLIVNPWNTEELANAIHDAVTMGPEQREANFKKLERYVFKYTSAWWGSSFVAELNRL
3MXU , Knot 68 143 0.78 38 105 137
MAHHHHHHMGTLEAQTQGPGSMSKTYFTQDHEWLSVEGQVVTVGITDYAQEQLGDLVFIDLPQNGTKLSKGDAAAVVESVKAASDVYAPLDGEVVEINAALAESPELVNQKAETEGWLWKMTVQDETQLERLLDEAAYKELIG
6FJQ , Knot 91 195 0.82 38 141 186
DKLTLWTTPDPSPNCKIDQDKDSKLTFVLTKCGSQILANMSLLVVKGKFSMINNKVNGTDDYKKFTIKLLFDEKGVLLKDSSLDKEYWNYRSNNNNVGSAYEEAVGFMPSTTAYPKPPTPPTNPTTPLEKSQAKNKYVSNVYLGGQAGNPVATTVSFNKETGCTYSITFDFAWNKTYENVQFDSSFLTFSYIAQE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6JBW_1)}(2) \setminus P_{f(3MXU_1)}(2)|=167\), \(|P_{f(3MXU_1)}(2) \setminus P_{f(6JBW_1)}(2)|=22\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:011110001110100000100010100111101101110000101011111011010111110010000110111100011000001110011111100011010100011010001001110011001001011110000111110110001100000101111100111000100111100111011100011110000010011000001101100101101110110111111110100110110010100011110001011011111001001011100101101110001011101111011110000100000101110011101010110101111011000101001111011001011000001101100001100000011111001011100101011101100001101100110111000010100100011000011110011101001
Pair \(Z_2\) Length of longest common subsequence
6JBW_1,3MXU_1 189 4
6JBW_1,6FJQ_1 175 4
3MXU_1,6FJQ_1 156 3

Newick tree

 
[
	6JBW_1:95.02,
	[
		6FJQ_1:78,3MXU_1:78
	]:17.02
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{608 }{\log_{20} 608}-\frac{143}{\log_{20}143})=134.\)
Status Protein1 Protein2 d d1/2
Query variables 6JBW_1 3MXU_1 171 109.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]