CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5QEW_1 8EMW_1 4BRX_1 Letter Amino acid
11 26 11 M Methionine
13 59 14 T Threonine
9 28 10 Y Tyrosine
17 65 16 V Valine
14 71 10 Q Glutamine
11 17 5 H Histidine
23 69 16 K Lycine
18 66 12 D Aspartic acid
30 116 20 E Glutamic acid
18 44 18 I Isoleucine
29 135 27 L Leucine
22 81 14 P Proline
13 87 16 A Alanine
19 94 19 R Arginine
11 47 9 N Asparagine
25 92 19 S Serine
14 72 15 G Glycine
14 42 12 F Phenylalanine
6 7 5 W Tryptophan
4 14 8 C Cysteine

5QEW_1|Chain A|Tyrosine-protein phosphatase non-receptor type 1|Homo sapiens (9606)
>8EMW_1|Chain A|1-phosphatidylinositol 4,5-bisphosphate phosphodiesterase beta-3|Homo sapiens (9606)
>4BRX_1|Chain A|FOCAL ADHESION KINASE 1|GALLUS GALLUS (9031)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5QEW , Knot 141 321 0.84 40 205 309
MEMEKEFEQIDKSGSWAAIYQDIRHEASDFPSRVAKLPKNKNRNRYRDVSPFDHSRIKLHQEDNDYINASLIKMEEAQRSYILTQGPLPNTVGHFWEMVWEQKSRGVVMLNRVMEKGSLKCAQYWPQKEEKEMIFEDTNLKLTLISEDIKSYYTVRQLELENLTTQETREILHFHYTTWPDFGVPESPASFLNFLFKVRESGSLSPEHGPVVVHCSAGIGRSGTFCLADTCLLLMDKRKDPSSVDIKKVLLEMRKFRMGLIQTADQLRFSYLAVIEGAKFIMGDSSVQDQWKELSHEDLEPPPEHIPPPPRPPKRILEPHN
8EMW , Knot 434 1232 0.83 40 309 1049
GPSRATMALQLEPPTVVETLRRGSKFIKWDEETSSRNLVTLRVDPNGFFLYWTGPNMEVDTLDISSIRDTRTGRYARLPKDPKIREVLGFGGPDARLEEKLMTVVSGPDPVNTVFLNFMAVQDDTAKVWSEELFKLAMNILAQNASRNTFLRKAYTKLKLQVNQDGRIPVKNILKMFSADKKRVETALESCGLKFNRSESIRPDEFSLEIFERFLNKLCLRPDIDKILLEIGAKGKPYLTLEQLMDFINQKQRDPRLNEVLYPPLRPSQARLLIEKYEPNQQFLERDQMSMEGFSRYLGGEENGILPLEALDLSTDMTQPLSAYFINSSHNTYLTAGQLAGTSSVEMYRQALLWGCRCVELDVWKGRPPEEEPFITHGFTMTTEVPLRDVLEAIAETAFKTSPYPVILSFENHVDSAKQQAKMAEYCRSIFGDALLIEPLDKYPLAPGVPLPSPQDLMGRILVKNKKRHRPSAGGPDSAGRKRPLEQSNSALSESSAATEPSSPQLGSPSSDSCPGLSNGEEVGLEKPSLEPQKSLGDEGLNRGPYVLGPADREDEEEDEEEEEQTDPKKPTTDEGTASSEVNATEEMSTLVNYIEPVKFKSFEAARKRNKCFEMSSFVETKAMEQLTKSPMEFVEYNKQQLSRIYPKGTRVDSSNYMPQLFWNVGCQLVALNFQTLDVAMQLNAGVFEYNGRSGYLLKPEFMRRPDKSFDPFTEVIVDGIVANALRVKVISGQFLSDRKVGIYVEVDMFGLPVDTRRKYRTRTSQGNSFNPVWDEEPFDFPKVVLPTLASLRIAAFEEGGKFVGHRILPVSAIRSGYHYVCLRNEANQPLCLPALLIYTEASDYIPDDHQDYAEALINPIKHVSLMDQRARQLAALIGESEAQAGQETCQDTQSQQLGSQPSSNPTPSPLDASPRRPPGPTTSPASTSLSSPGQRDDLIASILSEVAPTPLDELRGHKALVKLRSRQERDLRELRKKHQRKAVTLTRRLLDGLAQAQAEGRCRLRPGALGGAADVEDTKEGEDEAKRYQEFQNRQVQSLLELREAQVDAEAQRRLEHLRQALQRLREVVLDANTTQFKRLKEMNEREKKELQKILDRKRHNSISEAKMRDKHKKEAELTEINRRHITESVNSIRRLEEAQKQRHDRLVAGQQQVLQQLAEEEPKLLAQLAQECQEQRARLPQEIRRSLLGEMPEGLGDGPLVACASNGHAPGSSGHLSGADSESQEENTQL
4BRX , Knot 125 276 0.84 40 185 266
STRDYEIQRERIELGRCIGEGQFGDVHQGIYMSPENPAMAVAIKTCKNCTSDSVREKFLQEALTMRQFDHPHIVKLIGVITENPVWIIMELCTLGELRSFLQVRKFSLDLASLILYAYQLSTALAYLESKRFVHRDIAARNVLVSATDCVKLGDFGLSRYMEDSTYYKASKGKLPIKWMAPESINFRRFTSASDVWMFGVCMWEILMHGVKPFQGVKNNDVIGRIENGERLPMPPNCPPTLYSLMTKCWAYDPSRRPRFTELKAQLSTILEEEKLQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5QEW_1)}(2) \setminus P_{f(8EMW_1)}(2)|=32\), \(|P_{f(8EMW_1)}(2) \setminus P_{f(5QEW_1)}(2)|=136\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101000100100010111100010001001100110110000000000101100001010000000101011010010000110011110011011011100000111110011001010010011000000111000010101100010000010010100100000001101000011011110011011011101000101010011111000111100101011000111100000100101001110100101111001001010011110110111100010001001000010111001111101100110100
Pair \(Z_2\) Length of longest common subsequence
5QEW_1,8EMW_1 168 4
5QEW_1,4BRX_1 174 4
8EMW_1,4BRX_1 180 4

Newick tree

 
[
	4BRX_1:89.96,
	[
		5QEW_1:84,8EMW_1:84
	]:5.96
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1553 }{\log_{20} 1553}-\frac{321}{\log_{20}321})=317.\)
Status Protein1 Protein2 d d1/2
Query variables 5QEW_1 8EMW_1 401 251.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]