CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4KMV_1 1KWH_1 8PXV_1 Letter Amino acid
7 24 35 R Arginine
1 0 2 C Cysteine
7 25 27 Q Glutamine
11 38 35 D Aspartic acid
2 8 25 H Histidine
11 44 44 K Lycine
2 28 34 P Proline
1 16 12 W Tryptophan
12 32 60 A Alanine
6 26 19 N Asparagine
8 36 53 G Glycine
3 23 49 I Isoleucine
11 29 29 F Phenylalanine
12 15 42 S Serine
7 22 44 T Threonine
5 24 23 Y Tyrosine
6 27 38 E Glutamic acid
11 25 39 L Leucine
6 16 16 M Methionine
8 34 29 V Valine

4KMV_1|Chains A, B|Dehaloperoxidase A|Amphitrite ornata (129555)
>1KWH_1|Chain A|Macromolecule-Binding Periplasmic Protein|Sphingomonas sp. (90322)
>8PXV_1|Chains A, B, C|Beta-N-acetylhexosaminidase|Akkermansia muciniphila (239935)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4KMV , Knot 68 137 0.81 40 109 134
GFKQDIATIRGDLRTYAQDIFLAFLNKYPDERRYFKNYVGKSDQELKSMAKFGDHTEKVFNLMMEVADRATDCVPLASDANTLVQMKQHSSLTTGNFEKFFVALVEYMRASGQSFDSQSWDRFGKNLVSALSSAGMK
1KWH , Knot 201 492 0.84 38 254 467
KEATWVTDKPLTLKIHMHFRDKWVWDENWPVAKESFRLTNVKLQSVANKAATNSQEQFNLMMASGDLPDVVGGDNLKDKFIQYGQEGAFVPLNKLIDQYAPHIKAFFKSHPEVERAIKAPDGNIYFIPYVPDGVVARGYFIREDWLKKLNLKPPQNIDELYTVLKAFKEKDPNGNGKADEVPFIDRHPDEVFRLVNFWGARSSGSDNYMDFYIDNGRVKHPWAETAFRDGMKHVAQWYKEGLIDKEIFTRKAKAREQMFGGNLGGFTHDWFASTMTFNEGLAKTVPGFKLIPIAPPTNSKGQRWEEDSRQKVRPDGWAITVKNKNPVETIKFFDFYFSRPGRDISNFGVPGVTYDIKNGKAVFKDSVLKSPQPVNNQLYDMGAQIPIGFWQDYDYERQWTTPEAQAGIDMYVKGKYVMPGFEGVNMTREERAIYDKYWADVRTYMYEMGQAWVMGTKDVDKTWDEYQRQLKLRGLYQVLQMMQQAYDRQYKN
8PXV , Knot 255 655 0.84 40 278 602
MQEQIIPKPAEITLFTGSPARLTPDSLIITETQDKAFLDQAGQLQQMLSAGTGLPLPLKPAGQASKKAACIVIKKDPALAARGEEAYSIQSSPSGIILSAADARGIFYAGQSLVQMMPSVFHDRTGDKSAVRWNISETPFRITDYPRFSWRALMIDEARHFFGEKTIKQIIDQMALLKMNILHWHLTDDTGWRIEIKKYPRLTSIGSKRRESEIGTWNSGKSDGTPHEGFYTQEQIRDIVQYAARRNITIVPEIEMPGHASAAAVAYPFLSLKTPGEVPTTFIVNTAFDPTSEKTYAFLSDVLDEVTAIFPGRIIHIGGDEVRYDKQWKGVPEIEEFMKKNGMKSYADVQMHFTNRMSGIIAQKGRRMMGWNEIYGHDVNGDGGGKAGAKLDTNAVIQFWKGNTSLAKNAIRDGHDVINSLHTSTYLDYSYGSIPLQKAYGFEPVFPGLEKQYHSRVKGLGAQVWTEWISTPERLHYQAFPRACAFAEVGWTPAGKKDFPDFKKRLKAYSERMDLMGIKFARNVISQIDKSDFFNTPRIGTWTPATLTREEHSFDVTKLVKASGKHTVTLLYDKGAHAIEIESVALYENSREVSRDAHAGRSGAHKENIQYILNAPAPRQGATYTVKANFKGAGGRDSHGTVYFETPLEHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4KMV_1)}(2) \setminus P_{f(1KWH_1)}(2)|=26\), \(|P_{f(1KWH_1)}(2) \setminus P_{f(4KMV_1)}(2)|=171\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11000110101010001001111110001000001000110000010011011000001101110110010001111001001101000001001010011111100101010010000100110011011001110
Pair \(Z_2\) Length of longest common subsequence
4KMV_1,1KWH_1 197 5
4KMV_1,8PXV_1 207 4
1KWH_1,8PXV_1 142 4

Newick tree

 
[
	4KMV_1:10.22,
	[
		1KWH_1:71,8PXV_1:71
	]:38.22
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{629 }{\log_{20} 629}-\frac{137}{\log_{20}137})=142.\)
Status Protein1 Protein2 d d1/2
Query variables 4KMV_1 1KWH_1 180 113
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]