CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6GSG_1 4TZF_1 5MTF_1 Letter Amino acid
16 15 15 V Valine
30 36 19 A Alanine
9 4 11 W Tryptophan
19 4 6 Y Tyrosine
22 12 5 P Proline
26 16 5 D Aspartic acid
39 24 20 G Glycine
8 9 4 H Histidine
31 18 26 L Leucine
12 8 6 K Lycine
13 9 13 F Phenylalanine
32 14 4 T Threonine
15 10 6 Q Glutamine
9 4 5 E Glutamic acid
18 11 13 I Isoleucine
11 6 10 M Methionine
28 12 9 S Serine
18 8 9 R Arginine
21 10 3 N Asparagine
6 1 1 C Cysteine

6GSG_1|Chain A|Catechol oxidase|Aspergillus oryzae (strain ATCC 42149 / RIB 40) (510516)
>4TZF_1|Chain A|NDM-8 metallo-beta-lactamase|Escherichia coli (562)
>5MTF_1|Chain A|Rhomboid protease GlpG|Escherichia coli O45:K1 (strain S88 / ExPEC) (585035)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6GSG , Knot 162 383 0.83 40 232 367
AATATLPTTASSSTAVASSQLDQLANFAYNVTTDSVAGGSESKRGGCTLQNLRVRRDWRAFSKTQKKDYINSVLCLQKLPSRTPAHLAPGARTRYDDFVATHINQTQIIHYTGTFLAWHRYFIYEFEQALRDECSYTGDYPYWNWGADADNMEKSQVFDGSETSMSGNGEYIPNQGDIKLLLGNYPAIDLPPGSGGGCVTSGPFKDYKLNLGPAALSLPGGNMTAAANPLTYNPRCMKRSLTTEILQRYNTFPKIVELILDSDDIWDFQMTMQGVPGSGSIGVHGGGHYSMGGDPGRDVYVSPGDTAFWLHHGMIDRVWWIWQNLDLRKRQNAISGTGTFMNNPASPNTTLDTVIDLGYANGGPIAMRDLMSTTAGPFCYVYL
4TZF , Knot 105 231 0.82 40 151 222
GPGDQRFGDLVFRQLAPNVWQHTSYLDMPGFGAVASNGLIVRDGGRVLVVDTAWTDDQTAQILNWIKQEINLPVALAVVTHAHQDKMGGMGALHAAGIATYANALSNQLAPQEGLVAAQHSLTFAANGWVEPATAPNFGPLKVFYPGPGHTSDNITVGIDGTDIAFGGCLIKDSKAKSLGNLGDADTEHYAASARAFGAAFPKASMIVMSHSAPDSRAAITHTARMADKLR
5MTF , Knot 87 190 0.80 40 130 183
AALRERAGPVTWVMMIACVVVFIAMQILGDQEVMLWLAWPFDPTLKFEFWRYFTHALMHFSLMHILFNLLWWWYLGGAVEKRLGSGKLIVITLISALLSGYVQQKFSGPWFGGLSGVVYALMGYVWLRGERDPQSGIYLQRGLIIFALIWIVAGWFDLFGMSMANGAHIAGLAVGLAMAFVDSLNARKRK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6GSG_1)}(2) \setminus P_{f(4TZF_1)}(2)|=126\), \(|P_{f(4TZF_1)}(2) \setminus P_{f(6GSG_1)}(2)|=45\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11010110010000111000100110110010000111100000110010010100010110000000010011010011000110111110000001110010000110001011110001100100110000000100101011101001000011010000101010011001010111100111011110111010011100001011111101111010111011000100100010001100000110110111000011010101011110101110111000111011001010110011110011100111110010100000110101011001101000100110110101111110011000111100101
Pair \(Z_2\) Length of longest common subsequence
6GSG_1,4TZF_1 171 4
6GSG_1,5MTF_1 184 3
4TZF_1,5MTF_1 149 3

Newick tree

 
[
	6GSG_1:93.09,
	[
		4TZF_1:74.5,5MTF_1:74.5
	]:18.59
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{614 }{\log_{20} 614}-\frac{231}{\log_{20}231})=108.\)
Status Protein1 Protein2 d d1/2
Query variables 6GSG_1 4TZF_1 136 107.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]