CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5ISD_1 7EXT_1 1BOT_1 Letter Amino acid
25 19 46 A Alanine
3 3 5 C Cysteine
24 17 14 F Phenylalanine
26 19 36 V Valine
32 18 36 L Leucine
17 6 22 K Lycine
17 13 14 P Proline
27 17 21 S Serine
14 22 32 R Arginine
11 14 20 N Asparagine
19 10 40 E Glutamic acid
19 15 34 I Isoleucine
22 11 37 T Threonine
18 16 27 D Aspartic acid
20 19 20 Q Glutamine
12 1 9 H Histidine
12 3 14 M Methionine
23 11 43 G Glycine
4 2 13 W Tryptophan
16 12 18 Y Tyrosine

5ISD_1|Chains A, B, C, D|Histone-arginine methyltransferase CARM1|Mus musculus (10090)
>7EXT_1|Chains AD[auth A2], A[auth A1], BA[auth A4], BE[auth A5], CF[auth A6], FH[auth A8]|Phycobilisome rod-core linker polypeptide CpcG|Synechococcus sp. PCC 7002 (32049)
>1BOT_1|Chains A[auth O], B[auth Z]|PROTEIN (GLYCEROL KINASE)|Escherichia coli (562)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5ISD , Knot 157 361 0.85 40 213 356
GHMGHTLERSVFSERTEESSAVQYFQFYGYLSQQQNMMQDYVRTGTYQRAILQNHTDFKDKIVLDVGCGSGILSFFAAQAGARKIYAVEASTMAQHAEVLVKSNNLTDRIVVIPGKVEEVSLPEQVDIIISEPMGYMLFNERMLESYLHAKKYLKPSGNMFPTIGDVHLAPFTDEQLYMEQFTKANFWYQPSFHGVDLSALRGAAVDEYFRQPVVDTFDIRILMAKSVKYTVNFLEAKEGDLHRIEIPFKFHMLHSGLVHGLAFWFDVAFIGSIMTVWLSTAPTEPLTHWYQVRCLFQSPLFAKAGDTLSGTCLLIANKRQSYDISIVAQVDQTGSKSSNLLDLKNPFFRYTGTTPSPPPG
7EXT , Knot 117 248 0.86 40 171 241
MTIPLLQYAPSSQNTRVAGYTVGGDEQPFVFTTDNVISDSDFDVLINAAYRQIFFHAFKCDRQQLLESQLRNGQITVRDFIRGLLLSETFIDSFYNKNSNYRFVEQCIQRVLGRDPFSEQEKIAWSIVICTKGLAAFVDQLLNTDEYMENFGYDTVPYQRRRSLASREQGEIPFNIKSPRYDAYYRSQLGFPQVVWQNAVRRFRTPDRVPQAGDPALFLNMARSAQIPKVNVRVSAADISLAAVPYRN
1BOT , Knot 206 501 0.85 40 255 476
TEKKYIVALDQGTTSSRAVVMDHDANIISVSQREFEQIYPKPGWVEHDPMEIWATQSSTLVEVLAKADISSDQIAAIGITNQRETTIVWEKETGKPIYNAIVWQCRRTAEICEHLKRDGLEDYIRSNTGLVIDPYFSGTKVKWILDHVEGSRERARRGELLFGTVDTWLIWKMTQGRVHVTDYTNASRTMLFNIHTLDWDDKMLEVLDIPREMLPEVRRSSEVYGQTNIGGKGGTRIPISGIAGDQQAALFGQLCVKEGMAKNTYGTGCFMLMNTGEKAVKSENGLLTTIACGPTGEVNYALEGAVFMAGASIQWLRDEMKLINDAYDSEYFATKVQNTNGVYVVPAFTGLGAPYWDPYARGAIFGLTRGVNANHIIRATLESIAYQTRDVLEAMQADSGIRLHALRVDGGAVANNFLMQFQSDILGTRVERPEVREVTALGAAYLAGLAVGFWQNLDELQEKAVIEREFRPGIETTERNYRYAGWKKAVKRAMAWEEHDE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5ISD_1)}(2) \setminus P_{f(7EXT_1)}(2)|=105\), \(|P_{f(7EXT_1)}(2) \setminus P_{f(5ISD_1)}(2)|=63\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1011001000110000000011001010101000001100010010000111000001000111011010111011110111001011010011001011100001000111111010010110010111001110111000110001010001010101110110101111000010100100101100101011010110111100010011100101011110010001011010010100101110101100111011111101111101101110011001100100100110011110110010100111100000001011101000100000110100111000100101111
Pair \(Z_2\) Length of longest common subsequence
5ISD_1,7EXT_1 168 4
5ISD_1,1BOT_1 164 4
7EXT_1,1BOT_1 172 4

Newick tree

 
[
	7EXT_1:85.98,
	[
		5ISD_1:82,1BOT_1:82
	]:3.98
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{609 }{\log_{20} 609}-\frac{248}{\log_{20}248})=101.\)
Status Protein1 Protein2 d d1/2
Query variables 5ISD_1 7EXT_1 130 108.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]