CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7KPR_1 3NPC_1 4YXT_1 Letter Amino acid
14 16 21 Y Tyrosine
28 21 21 D Aspartic acid
30 26 23 I Isoleucine
39 18 29 S Serine
5 6 2 W Tryptophan
26 30 22 V Valine
13 17 16 Q Glutamine
35 27 31 E Glutamic acid
38 15 38 G Glycine
25 16 22 T Threonine
35 24 35 A Alanine
32 16 23 R Arginine
8 8 10 C Cysteine
14 13 23 F Phenylalanine
8 13 15 M Methionine
27 18 16 P Proline
19 13 12 N Asparagine
15 10 6 H Histidine
51 33 35 L Leucine
24 24 20 K Lycine

7KPR_1|Chains A, B|Protein phosphatase 1H|Homo sapiens (9606)
>3NPC_1|Chains A, B|Mitogen-activated protein kinase 9|Homo sapiens (9606)
>4YXT_1|Chains A, B|Polyketide biosynthesis 3-hydroxy-3-methylglutaryl-ACP synthase PksG|Bacillus subtilis (strain 168) (224308)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7KPR , Knot 199 486 0.84 40 246 463
GSHMSDLPLRFPYGRPEFLGLSQDEVECSADHIARPILILKETRRLPWATGYAEVINAGKSTHNEDQASCEVLTVKKKAGAVTSTPNRNSSKRRSSLPNGEGLQLKENSESEGVSCHYWSLFDGHAGSGAAVVASRLLQHHITEQLQDIVDILKNSAVLPPTCLGEEPENTPANSRTLTRAASLRGGVGAPGSPSTPPTRFFTEKKIPHECLVIGALESAFKEMDLQIERERSSYNISGGCTALIVICLLGKLYVANAGASRAIIIRNGEIIPMSSEFTPETERQRLQYLAFMQPHLLGNEFTHLEFPRRVQRKELGKKMLYRDFNMTGWAYKTIEDEDLKFPLIYGEGKKARVMATIGVTRGLGDHDLKVHDSNIYIKPFLSSAPEVRIYDLSKYDHGSDDVLILATDGLWDVLSNEEVAEAITQFLPNCDPDDPHRYTLAAQDLVMRARGVLKDRGWRISNDRLGSGDDISVYVIPLIHGNKLS
3NPC , Knot 157 364 0.84 40 229 354
MSKSKVDNQFYSVEVADSTFTVLKRYQQLKPIGSGAQGIVCAAFDTVLGINVAVKKLSRPFQNQTHAKRAYRELVLLKCVNHKNIISLLNVFTPQKTLEEFQDVYLVMELMDANLCQVIHMELDHERMSYLLYQMLCGIKHLHSAGIIHRDLKPSNIVVKSDCTLKILDFGLARTACTNFMMTPYVVTRYYRAPEVILGMGYAANVDIWSVGCIMGELVKGCVIFQGTDHIDQWNKVIEQLGTPSAEFMAALQPTVRNYVENRPKYPGIKFEELFPDWIFPSESERDKIKTSQARDLLSKMLVIDPDKRISVDEALRHPYITVWYDPAEAEAPPPQIYDAQLEEREHAIEEWKELIYKEVMDWE
4YXT , Knot 175 420 0.84 40 222 389
MVSAGIEAMNVFGGTAYLDVMELAKYRHLDTARFENLLMKEKAVALPYEDPVTFGVNAAKPIIDALSEAEKDRIELLITCSASGIDFGKSLSTYIHEYLGLNRNCRLFEVKQACYSGTAGFQMAVNFILSQTSPGAKALVIASDISRFLIAEGGDALSEDWSYAEPSAGAGAVAVLVGENPEVFQIDPGANGYYGYEVMDTCRPIPDSEAGDSDLSLMSYLDCCEQTFLEYQKRVPGANYQDTFQYLAYHTPFGGMVKGAHRTMMRKVAKVKTSGIETDFLTRVKPGLNYCQRVGNIMGAALFLALASTIDQGRFDTPKRIGCFSYGSGCCSEFYSGITTPQGQERQRTFGIEKHLDRRYQLSMEEYELLFKGSGMVRFGTRNVKLDFEMIPGIMQSTQEKPRLFLEEISEFHRKYRWIS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7KPR_1)}(2) \setminus P_{f(3NPC_1)}(2)|=90\), \(|P_{f(3NPC_1)}(2) \setminus P_{f(7KPR_1)}(2)|=73\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100100111011010101111000010001001101111100000111101010110110000000010001101000111100010000000000110101101000000011000010110101101111110011000100010011011000111110011001000110000100110101111111010011001100001100011111100110010101000000001011001111101110101101110011110010111100010100000010011110101110010010110010000110011000101011100010000101111010100101110111001110001010000101011100110101001000001000111110011101100001101100111000100100001110011101011100011010000110100101011111010010
Pair \(Z_2\) Length of longest common subsequence
7KPR_1,3NPC_1 163 4
7KPR_1,4YXT_1 170 4
3NPC_1,4YXT_1 159 4

Newick tree

 
[
	7KPR_1:84.48,
	[
		3NPC_1:79.5,4YXT_1:79.5
	]:4.98
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{850 }{\log_{20} 850}-\frac{364}{\log_{20}364})=130.\)
Status Protein1 Protein2 d d1/2
Query variables 7KPR_1 3NPC_1 165 145.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]