CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7FQJ_1 7XMV_1 4HJX_1 Letter Amino acid
33 10 12 H Histidine
24 8 33 K Lycine
26 4 14 Y Tyrosine
27 26 16 D Aspartic acid
26 14 34 E Glutamic acid
27 19 13 S Serine
20 17 15 N Asparagine
9 11 8 F Phenylalanine
14 8 5 Q Glutamine
22 25 27 I Isoleucine
39 26 32 L Leucine
5 0 0 W Tryptophan
23 33 19 A Alanine
8 4 3 C Cysteine
16 10 11 M Methionine
24 13 12 P Proline
24 14 7 T Threonine
36 34 18 V Valine
17 25 15 R Arginine
24 20 20 G Glycine

7FQJ_1|Chains A, B, C|Legumain|Homo sapiens (9606)
>7XMV_1|Chains A, B, C, D, E, F|Ribose-phosphate pyrophosphokinase|Escherichia coli str. K-12 substr. MG1655 (511145)
>4HJX_1|Chains A, B|Tyrosine-tRNA ligase|Methanococcus Jannaschii DSM 2661 (243232)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7FQJ , Knot 185 444 0.84 40 250 423
MKLCILLAVVAFVGLSLGVPIDDPEDGGKHWVVIVAGSNGWYNYRHQADACHAYQIIHRNGIPDEQIVVMMYDDIAYSEDNPTPGIVINRPNGTDVYQGVPKDYTGEDVTPQNFLAVLRGDAEAVKGIGSGKVLKSGPQDHVFIYFTNHGSTGILVFPNEDLHVKDLNETIHYMYKHKMYRKMVFYIEACESGSMMNHLPDNINVYATTAANPRESSYACYYDEKRSTYLGDWYSVNWMEDSDVEDLTKETLHKQYHLVKSHTNTSHVMQYGQKTISTMKVMQFQGMKRKASSPVPLPPVTHLDLTPSPDVPLTIMKRKLMNTNDLEESRQLTEEIQRHLDARHLIEKSVRKIVSLLAASEAEVEQLLSERAPLTGHSCYPEALLHFRTHCFNWHSPTYEYALRHLYVLVNLCEKPYPLHRIKLSMDHVCLGHYVDHHHHHHHH
7XMV , Knot 140 321 0.84 38 192 310
MPDMKLFAGNATPELAQRIANRLYTSLGDAAVGRFSDGEVSVQINENVRGGDIFIIQSTCAPTNDNLMELVVMVDALRRASAGRITAVIPYFGYARQDRRVRSARVPITAKVVADFLSSVGVDRVLTVDLHAEQIQGFFDVPVDNVFGSPILLEDMLQLNLDNPIVVSPDIGGVVRARAIAKLLNDTDMAIIDKRRPRANVSQVMHIIGDVAGRDCVLVDDMIDTGGTLCKAAEALKERGAKRVFAYATHPIFSGNAANNLRNSVIDEVVVCDTIPLSDEIKSLPNVRTLTLSGMLAEAIRRISNEESISAMFEHHHHHHH
4HJX , Knot 135 314 0.82 38 177 295
MDEFEMIKRNTSEIISEEELREVLKKDEKSARIGFEPSGKIHLGHYLQIKKMIDLQNAGFDIIIYLADLGAYLNQKGELDEIRKIGDYNKKVFEAMGLKAKYVYGSENCLDKDYTLNVYRLALKTTLKRARRSMELIAREDENPKVAEVIYPIMQVNNIHYSGVDVAVGGMEQRKIHMLARELLPKKVVCIHNPVLTGLDGEGKMSSSKGNFIAVDDSPEEIRAKIKKAYCPAGVVEGNPIMEIAKYFLEYPLTIKRPEKFGGDLTVNSYEELESLFKNKELHPMDLKNAVAEELIKILEPIRKRLLEHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7FQJ_1)}(2) \setminus P_{f(7XMV_1)}(2)|=115\), \(|P_{f(7XMV_1)}(2) \setminus P_{f(7FQJ_1)}(2)|=57\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101011111111111011111001001100111111100110000001010010011000111000111110001100000101111100101001001110000100101001111101010110111010110011000111010001001111110001010010001001000010001110101000101100110010101001101000001000000000011010010110000100100001000001100000000110010001001011010110001001111111001010101011101100011000010000010001000101001100010011011110010100110001110100001011101000010100100001100101110100010110010101001011001000000000
Pair \(Z_2\) Length of longest common subsequence
7FQJ_1,7XMV_1 172 7
7FQJ_1,4HJX_1 167 6
7XMV_1,4HJX_1 149 7

Newick tree

 
[
	7FQJ_1:87.91,
	[
		4HJX_1:74.5,7XMV_1:74.5
	]:13.41
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{765 }{\log_{20} 765}-\frac{321}{\log_{20}321})=121.\)
Status Protein1 Protein2 d d1/2
Query variables 7FQJ_1 7XMV_1 153 130.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]