CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5FMV_1 9FII_1 6THC_1 Letter Amino acid
16 8 1 Y Tyrosine
33 7 9 N Asparagine
16 5 1 C Cysteine
22 4 12 H Histidine
15 9 6 F Phenylalanine
17 10 7 P Proline
17 12 23 V Valine
10 5 15 R Arginine
19 7 19 D Aspartic acid
11 5 5 Q Glutamine
5 5 6 M Methionine
20 5 12 S Serine
25 17 9 E Glutamic acid
13 7 23 G Glycine
23 18 19 L Leucine
32 5 9 T Threonine
3 0 1 W Tryptophan
15 7 41 A Alanine
19 10 9 I Isoleucine
30 14 10 K Lycine

5FMV_1|Chains A, B|RECEPTOR-TYPE TYROSINE-PROTEIN PHOSPHATASE C|HOMO SAPIENS (9606)
>9FII_1|Chains A, C|NADH-quinone oxidoreductase subunit E|Aquifex aeolicus VF5 (224324)
>6THC_1|Chains A, B, C, D|Coenzyme A biosynthesis bifunctional protein CoaBC|Mycolicibacterium smegmatis (1772)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5FMV , Knot 153 361 0.83 40 209 343
ETGKPTCDEKYANITVDYLYNKETKLFTAKLNVNENVECGNNTCTNNEVHNLTECKNASVSISHNSCTAPDKTLILDVPPGVEKFQLHDCTQVEKADTTICLKWKNIETFTCDTQNITYRFQCGNMIFDNKEIKLENLEPEHEYKCDSEILYNNHKFTNASKIIKTDFGSPGEPQIIFCRSEAAHQGVITWNPPQRSFHNFTLCYIKETEKDCLNLDKNLIKYDLQNLKPYTKYVLSLHAYIIAKVQRNGSAAMCHFTTKSAPPSQVWNMTVSMTSDNSMHVKCRPPRDRNGPHERYHLEVEAGNTLVRNESHKNCDFRVKDLQYSTDYTFKAYFHNGDYPGEPFILHHSTSGTKHHHHHH
9FII , Knot 80 160 0.84 38 127 158
MFKTEFEFPEELKTKLQEHINYFPKKRQAILLCLHEIQNYYGYIPPESLKPLADMLELPLNHVEGVVAFYDMFDREDKAKYRIRVCVSIVCHLMGTNKLLKALENILGIKPGEVTPDGKFKIVPVQCLGACSEAPVFMVNDDEYKFESEVQLNEILSRYT
6THC , Knot 100 237 0.77 40 148 223
MAHHHHHHDMAGVKALVTAGGTREPLDPVRFIGNRSSGKQGYAVARVLAQRGADVTLIAGNTAGLIDPAGVEMVHIGSATQLRDAVSKHAPDANVLVMAAAVADFRPAHVAAAKIKKGASEPSSIDLVRNDDVLAGAVRARADGQLPNMRAIVGFAAETGDANGDVLFHARAKLERKGCDLLVVNAVGENRAFEVDHNDGWLLSADGTESALEHGSKTLMATRIVDSIAAFLKSQDG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5FMV_1)}(2) \setminus P_{f(9FII_1)}(2)|=139\), \(|P_{f(9FII_1)}(2) \setminus P_{f(5FMV_1)}(2)|=57\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0010100000010101001000000110101010001001000000001001000001010100000011000111011111001010000010010001010100100100000010001001011100001010010100000000011000001001001100011011010111000011001110101100010010100100000001010001100010010100001101010111010001011100100001110011010101000001010001100001100000101011001100000000010100100000001010100100110111100000100000000
Pair \(Z_2\) Length of longest common subsequence
5FMV_1,9FII_1 196 3
5FMV_1,6THC_1 185 6
9FII_1,6THC_1 167 3

Newick tree

 
[
	5FMV_1:98.90,
	[
		6THC_1:83.5,9FII_1:83.5
	]:15.40
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{521 }{\log_{20} 521}-\frac{160}{\log_{20}160})=105.\)
Status Protein1 Protein2 d d1/2
Query variables 5FMV_1 9FII_1 132 95
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]