CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6SNB_1 2VZP_1 7JXN_1 Letter Amino acid
12 0 0 M Methionine
30 18 0 T Threonine
4 3 0 W Tryptophan
14 2 0 Q Glutamine
11 3 1 E Glutamic acid
11 4 2 I Isoleucine
9 4 1 K Lycine
21 11 1 S Serine
3 0 0 C Cysteine
15 8 1 N Asparagine
14 7 1 D Aspartic acid
19 14 3 G Glycine
14 3 2 L Leucine
15 3 1 F Phenylalanine
12 5 0 Y Tyrosine
20 3 0 R Arginine
5 1 0 H Histidine
20 5 0 P Proline
23 18 3 V Valine
26 15 5 A Alanine

6SNB_1|Chain A|Capsid protein VP1|Coxsackievirus A10 (42769)
>2VZP_1|Chains A, B|EXO-BETA-D-GLUCOSAMINIDASE|AMYCOLATOPSIS ORIENTALIS (31958)
>7JXN_1|Chains A, B, C, D|Amyloid-beta 17-36 peptide|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6SNB , Knot 133 298 0.84 40 192 291
GDPVEDIIHDALSSTVRRAITSGQDVNTAAGTAPSSHRLETGRVPALQAAETGATSNATDENMIETRCVMNRNGVLEATISHFFSRSGLVGVVNLTDGGTDTTGYAVWDIDIMGFVQLRRKCEMFTYMRFNAEFTFVTTTENGEARPFMLQYMYVPPGAPKPTGRDAFQWQTATNPSVFVKLTDPPAQVSVPFMSPASAYQWFYDGYPTFGQHPETSNTTYGQCPNNMMGTFAVRVVSRVASQLKLQTRVYMKLKHVRAWIPRPIRSQPYLLKNFPNYDSSKITYSARDRASIKQANM
2VZP , Knot 63 127 0.80 36 89 123
SDPVDYQAEDATIVQGAVESNHAGYTGTGFVNYDNVAGSSVEWTVTVPSAGTYDVVVRYANGTTTSRPLDFSVNGSISASGVAFGSTGTWPAWTTKTVRVTLAAGVNKIKAVATTANGGPNVDKITL
7JXN , Knot 15 21 0.72 22 20 19
VALVFAAEDVGSNKGAIIGLA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6SNB_1)}(2) \setminus P_{f(2VZP_1)}(2)|=131\), \(|P_{f(2VZP_1)}(2) \setminus P_{f(6SNB_1)}(2)|=28\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1011001100110001001100100100111011000010010111101100110001000011000011000111010100110001111110100110000101110101111101000001100101010101100000101011110010111111010100110100100101110100111010111101101001100101011001000000010010011101110110011001010001010100101111011000101100110000001000100010100101
Pair \(Z_2\) Length of longest common subsequence
6SNB_1,2VZP_1 159 3
6SNB_1,7JXN_1 182 3
2VZP_1,7JXN_1 89 3

Newick tree

 
[
	6SNB_1:95.25,
	[
		2VZP_1:44.5,7JXN_1:44.5
	]:50.75
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{425 }{\log_{20} 425}-\frac{127}{\log_{20}127})=89.6\)
Status Protein1 Protein2 d d1/2
Query variables 6SNB_1 2VZP_1 117 79
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]