CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4JWH_1 7SHW_1 8XIW_1 Letter Amino acid
36 14 33 K Lycine
6 4 8 M Methionine
22 19 23 S Serine
4 3 21 W Tryptophan
7 13 25 Y Tyrosine
16 28 31 V Valine
15 31 56 A Alanine
26 16 42 L Leucine
15 26 25 T Threonine
11 16 39 G Glycine
8 13 22 F Phenylalanine
16 11 16 Q Glutamine
8 8 19 H Histidine
6 18 22 P Proline
17 10 26 N Asparagine
23 29 29 D Aspartic acid
31 23 39 E Glutamic acid
19 11 20 I Isoleucine
23 17 25 R Arginine
4 0 5 C Cysteine

4JWH_1|Chains A, B|tRNA (guanine(9)-N1)-methyltransferase|Schizosaccharomyces pombe (284812)
>7SHW_1|Chains A, B|LmcA|Mycolicibacterium smegmatis (1772)
>8XIW_1|Chains A, E|Methane monooxygenase|Methylosinus sporium (428)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4JWH , Knot 131 313 0.80 40 186 298
MGHHHHHHMMENKDALDIGKDDTNTSEADVSKNETQEQPVLSKSALKRLKRQQEWDAGREKRAEMRREKKRLRKEERKRKIEAGEVVKSQKKRIRLGKVVPSSIRIVLDCAFDDLMNDKEINSLCQQVTRCHSANRTALHPVELFATNFGGRLKTRQDFVLKGQQNNWKRYNPTTKSYLEEFESQKEKLVYLSADSDNTITELDEDKIYIIGAIVDKNRYKNLCQNKASEQGIKTAKLPIDEYIKITDRKILTVNQVFEILSLWLEYRDWEKAFMEVIPKRKGILLKSDESFDVSEDTRSQSNQSDSELEKEN
7SHW , Knot 135 310 0.83 38 181 293
MPEVVFGSTYTKGKIAKIPLDIDTSLVSDGTATAFDPDSLVAERFKIDRDVPVALQQQMSVEAPSNADVVTFQVGTTLRRTDRQQDAGLLLALVDTVTMNRNTAEAVSSENNPGGAVQKPRAIEDEKPPTNIALPHEGLTYRFPFDTEKKTYPFFDPIAQKAFDANYDGEEDVNGLTTYRFVQNVGYDADGKLADPIKYSSLYEDDADASVTARAEVWGVPGEPDESITMDRFYAASRTFWVDPVSGTIVKSEEHGYQYYAREALKPEVTYVDFKVTTNEESVESQVAAASDERDRIALWTRSRHHHHHH
8XIW , Knot 217 526 0.86 40 268 491
MAISLATKAATDALKVNRAPVGVEPQEVHKWLQSFNWDFKENRTKYATKYHMANQTKEQFKVIAKEYARMEAAKDERQFGTLLDGLTRLGAGNKVHPRWGETMKVISNFLEVGEYNAIAASAMLWDSATAAEQKNGYLAQVLDEIRHTHQCAFINHYYSKHYHDPAGHNDARRTRAIGPLWKGMKRVFADGFISGDAVECSVNLQLVGEACFTNPLIVAVTEWASANGDEITPTVFLSVETDELRHMANGYQTVVSIANDPAAAKYLNTDLNNAFWTQQKYFTPALGYLFEYGSKFKVEPWVKTWNRWVYEDWGGIWIGRLGKYGVESPRSLRDAKTDAYWAHHDLALAAYALWPLGFARLALPDEEDQEWFEANYPGWADHYGKIYNEWKKLGYEDPKSGFIPYAWLLANGHDVYIDRVSQVPFIPSLAKGSGSLRVHEFNGKKHSLTDDWGERMWLSEPERYECHNLFEQYEGRELSEVIAEGHGVRSDGKTLIAQPHVRGDNLWTLEDIKRAGCVFPNPLAKF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4JWH_1)}(2) \setminus P_{f(7SHW_1)}(2)|=79\), \(|P_{f(7SHW_1)}(2) \setminus P_{f(4JWH_1)}(2)|=74\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100000011000011011000000001010000000011100011001000001011000010100000010000000010110110000001011011100101110011001100001001000100000100011011011100111010000011101000010000100000100100000011010100000100100001011111100000001000010001100101110001010000110100110110111000010011101110001111000001010000000000000010000
Pair \(Z_2\) Length of longest common subsequence
4JWH_1,7SHW_1 153 6
4JWH_1,8XIW_1 156 4
7SHW_1,8XIW_1 167 3

Newick tree

 
[
	8XIW_1:82.17,
	[
		4JWH_1:76.5,7SHW_1:76.5
	]:5.67
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{623 }{\log_{20} 623}-\frac{310}{\log_{20}310})=87.1\)
Status Protein1 Protein2 d d1/2
Query variables 4JWH_1 7SHW_1 108 108
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]