CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6DOS_1 1IWD_1 1GEC_1 Letter Amino acid
10 23 29 G Glycine
13 7 14 K Lycine
1 2 0 M Methionine
5 9 13 Y Tyrosine
7 4 5 D Aspartic acid
5 8 8 P Proline
7 23 18 S Serine
10 10 9 T Threonine
9 18 21 V Valine
2 3 4 H Histidine
6 17 14 N Asparagine
11 13 11 I Isoleucine
10 8 10 L Leucine
6 19 14 A Alanine
0 7 7 C Cysteine
3 17 12 Q Glutamine
16 6 10 E Glutamic acid
2 6 3 F Phenylalanine
6 7 4 W Tryptophan
6 8 10 R Arginine

6DOS_1|Chain A|Ribonuclease H|Bacillus halodurans (86665)
>1IWD_1|Chain A|ERVATAMIN B|Tabernaemontana divaricata (52861)
>1GEC_1|Chain A[auth E]|GLYCYL ENDOPEPTIDASE|Carica papaya (3649)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6DOS , Knot 65 135 0.78 38 108 133
EEIIWESLSVDVGSQGNPGIVEYKGVDTKTGEVLFEREPIPIGTNNMGEFLAIVHGLRYLKERNSRKPIYSDSQTAIKWVKDKKAKSTLVRNEETALIWKLVDEAEEWLNTHTYETPILKWQTDKWGEIKADYGR
1IWD , Knot 98 215 0.81 40 145 205
LPSFVDWRSKGAVNSIKNQKQCGSCWAFSAVAAVESINKIRTGQLISLSEQELVDCDTASHGCNGGWMNNAFQYIITNGGIDTQQNYPYSAVQGSCKPYRLRVVSINGFQRVTRNNESALQSAVASQPVSVTVEAAGAPFQHYSSGIFTGPCGTAQNHGVVIVGYGTQSGKNYWIVRNSWGQNWGNQGYIWMERNVASSAGLCGIAQLPSYPTKA
1GEC , Knot 102 216 0.84 38 145 209
LPESVDWRAKGAVTPVKHQGYCESCWAFSTVATVEGINKIKTGNLVELSEQELVDCDLQSYGCNRGYQSTSLQYVAQNGIHLRAKYPYIAKQQTCRANQVGGPKVKTNGVGRVQSNNEGSLLNAIAHQPVSVVVESAGRDFQNYKGGIFEGSCGTKVDHAVTAVGYGKSGGKGYILIKNSWGPGWGENGYIRIRRASGNSPGVCGVYRSSYYPIKN

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6DOS_1)}(2) \setminus P_{f(1IWD_1)}(2)|=66\), \(|P_{f(1IWD_1)}(2) \setminus P_{f(6DOS_1)}(2)|=103\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:001110010101100101111000110000101110001111100011011111011001000000011000000110110000100011000001111011001001100000001110100001101010010
Pair \(Z_2\) Length of longest common subsequence
6DOS_1,1IWD_1 169 3
6DOS_1,1GEC_1 167 3
1IWD_1,1GEC_1 108 10

Newick tree

 
[
	6DOS_1:91.84,
	[
		1GEC_1:54,1IWD_1:54
	]:37.84
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{350 }{\log_{20} 350}-\frac{135}{\log_{20}135})=65.6\)
Status Protein1 Protein2 d d1/2
Query variables 6DOS_1 1IWD_1 82 67
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]