CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8FTP_1 3GXG_1 5LZH_1 Letter Amino acid
5 2 3 R Arginine
23 10 4 D Aspartic acid
5 4 4 H Histidine
15 6 10 I Isoleucine
20 9 9 K Lycine
11 9 3 P Proline
9 4 10 T Threonine
8 9 7 N Asparagine
24 8 7 E Glutamic acid
13 4 3 F Phenylalanine
14 8 5 S Serine
3 3 1 W Tryptophan
13 16 11 A Alanine
5 5 3 M Methionine
13 7 3 Y Tyrosine
1 1 2 C Cysteine
11 16 5 Q Glutamine
20 9 3 G Glycine
22 16 6 L Leucine
14 11 4 V Valine

8FTP_1|Chains A, B|Alpha/beta fold hydrolase|Staphylococcus aureus USA300-CA-263 (1385529)
>3GXG_1|Chains A, B, C, D|Putative phosphatase (DUF442)|Shewanella putrefaciens (319224)
>5LZH_1|Chains A, B, D|Cholera enterotoxin B subunit|Vibrio cholerae (666)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8FTP , Knot 113 249 0.83 40 167 236
GPGMQIKLPKPFFFEEGKRAVLLLHGFTGNSSDVRQLGRFLQKKGYTSYAPQYEGHAAPPDEILKSSPFVWFKDALDGYDYLVEQGYDEIVVAGLSLGGDFALKLSLNRDVKGIVTMCAPMGGKTEGAIYEGFLEYARNFKKYEGKDQETIDNEMDHFKPTETLKELSEALDTIKEQVDEVLDPILVIQAENDNMIDPQSANYIYDHVDSDDKNIKWYSESGHVITIDKEKEQVFEDIYQFLESLDWSE
3GXG , Knot 77 157 0.82 40 119 152
GNIESIENLQGIRALQQQAPQLLSSGLPNEQQFSLLKQAGVDVVINLMPDSSKDAHPDEGKLVTQAGMDYVYIPVDWQNPKVEDVEAFFAAMDQHKGKDVLVHCLANYRASAFAYLYQLKQGQNPNMAQTMTPWNDELAIYPKWQALLTEVSAKYGH
5LZH , Knot 55 103 0.82 40 89 101
TPQNITDLCAEYHNTQIHTLNDKIFSYTESLAGKREMAIITFKNGATFQVEVPGSQHIDSQKKAIERMKDTLRIAYLTEAKVEKLCVWNNKTPHAIAAISMAN

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8FTP_1)}(2) \setminus P_{f(3GXG_1)}(2)|=97\), \(|P_{f(3GXG_1)}(2) \setminus P_{f(8FTP_1)}(2)|=49\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111101011011110010011111011010000100110110001000011000101111001100011111001101000110010001111110111011101010001011101011111000111001110010010000100000100010010100010010011001000100110111110100001101001001000100000010100001011010000001100100110010100
Pair \(Z_2\) Length of longest common subsequence
8FTP_1,3GXG_1 146 3
8FTP_1,5LZH_1 170 3
3GXG_1,5LZH_1 146 3

Newick tree

 
[
	5LZH_1:81.19,
	[
		8FTP_1:73,3GXG_1:73
	]:8.19
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{406 }{\log_{20} 406}-\frac{157}{\log_{20}157})=74.4\)
Status Protein1 Protein2 d d1/2
Query variables 8FTP_1 3GXG_1 93 75
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]