CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4QES_1 5NFI_1 7NPF_1 Letter Amino acid
17 18 24 N Asparagine
16 8 11 H Histidine
19 10 17 P Proline
5 6 6 W Tryptophan
24 13 18 R Arginine
46 21 44 L Leucine
24 9 16 F Phenylalanine
25 17 22 S Serine
35 22 23 T Threonine
16 16 10 Y Tyrosine
29 17 27 E Glutamic acid
15 9 22 Q Glutamine
35 17 14 G Glycine
6 2 19 M Methionine
29 21 23 V Valine
52 19 35 A Alanine
3 6 4 C Cysteine
20 18 21 I Isoleucine
15 13 24 K Lycine
25 19 27 D Aspartic acid

4QES_1|Chains A, B, C|Non-haem bromoperoxidase BPO-A2, Matrix protein 1 chimera|Streptomyces aureofaciens (1894)
>5NFI_1|Chain A[auth B]|Minor fimbrium anchoring subunit Mfa2|Porphyromonas gingivalis (strain ATCC 33277 / DSM 20709 / CIP 103683 / JCM 12257 / NCTC 11834 / 2561) (431947)
>7NPF_1|Chains A, B, C, D, E, F, G, H|AAA family ATPase|Vibrio cholerae (666)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4QES , Knot 191 456 0.85 40 229 426
MPFITVGQENSTSIDLYYEDHGTGTPVVLIHGFPLSGHSWERQSAALLDAGARVITYDRRGFGQSSQPTTGYDYDTFAADLNTVLETLDLQDAVLVGFSMGTGEVARYVSSYGTARIAAVAFLASLEPFLLKTDDNPDGAAPQEFFDGIVAAVKADRYAFYTGFFNDFYNLDENLGTRISEEAVRNSWNTAASGGFFAAAAAPTTWYTDFRADIPRIDVPALILHGTGDRTLPIENTARVFHKALPSAEYVEVEGAPHGLLWTHAEEVNTALLAFLAKAQEAQKQKLLTEVETYVLSIIPSGPLKAEIAQRLEDVFAGKNTDLEVLMEWLKTRPILSPLTKGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDKAVKLYRKLKREITFHGAKEISLSYSAGALASCMGLIYNRMGAVTTEVAFGLVCATCEQIADSQHRSHRQLEHHHHHH
5NFI , Knot 129 281 0.86 40 192 269
CPRGVYVNFYSQTECAENPSYPAEVARLNVYAFDKDGILRSANVFEDVQLSAAKEWLIPLEKDGLYTIFAWGNIDDHYNIGEIKIGETTKQQVLMRLKQDGKWATNIDGTTLWYATSPVVELKNMEDGADQYIHTRANLREYTNRVTVSVDSLPHPENYEIKLASSNGSYRFDGTVAKADSTYYPGETKVVGDSTCRAFFTTLKLESGHENTLSVTHKPTGREIFRTDLVGAILSSQYAQNINLRCINDFDIRLVAHHCNCPDDTYVVVQIWINGWLIHSY
7NPF , Knot 178 407 0.87 40 234 395
MAMKREQTIENLYQLAQLTQQVQADRIEIVLEERRDEHFPPMSKALMETRSGLTRRKLDEAIAKMEEAGHQFTKNNANHYSISLSEAHMLMDAAGVPKFHERKKNNENKPWIINVQNQKGGTGKSMTAVHLAACLALNLDKRYRICLIDLDPQGSLRLFLNPQISLAEHTNIYSAVDIMLDNVPDGVQVDTEFLRKNVMLPTQYPNLKTISAFPEDAMFNAEAWQYLSQNQSLDIVRLLKEKLIDKIASDFDIIMIDTGPHVDPLVWNAMYASNALLIPCAAKRLDWASTVNFFQHLPTVYEMFPEDWKGLEFVRLMPTMFEDDNKKQVSVLTEMNYLLGDQVMMATIPRSRAFETCADTYSTVFDLTVNDFEGGKKTLATAQDAVQKSALELERVLHSHWSSLNQG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4QES_1)}(2) \setminus P_{f(5NFI_1)}(2)|=96\), \(|P_{f(5NFI_1)}(2) \setminus P_{f(4QES_1)}(2)|=59\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111101100000010100000101011111011110100100001111011101100000111000010010000011101001100101001111110110101100100010101111111101011110000010111100110111111010001100111001001000110010001100010011011111111110010001010110101111110101000111000101100111010010101110111100100100111111101001000011001000110111011101011001001111000010111011000111011001111111010110001100001100110101010010011010001000101011001010001111100111100011110001111110100001100000000010000000
Pair \(Z_2\) Length of longest common subsequence
4QES_1,5NFI_1 155 4
4QES_1,7NPF_1 175 4
5NFI_1,7NPF_1 182 4

Newick tree

 
[
	7NPF_1:92.85,
	[
		4QES_1:77.5,5NFI_1:77.5
	]:15.35
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{737 }{\log_{20} 737}-\frac{281}{\log_{20}281})=125.\)
Status Protein1 Protein2 d d1/2
Query variables 4QES_1 5NFI_1 156 124.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]