CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3NTO_1 5ADU_1 2IGW_1 Letter Amino acid
6 12 4 C Cysteine
12 34 7 P Proline
17 25 11 S Serine
11 29 6 R Arginine
14 37 9 N Asparagine
18 28 9 T Threonine
22 49 26 G Glycine
28 40 9 I Isoleucine
21 46 8 L Leucine
24 22 16 K Lycine
6 16 5 M Methionine
7 21 12 F Phenylalanine
4 16 1 W Tryptophan
25 52 8 A Alanine
23 33 9 D Aspartic acid
8 16 4 H Histidine
14 17 2 Y Tyrosine
39 36 14 V Valine
17 25 4 Q Glutamine
28 28 9 E Glutamic acid

3NTO_1|Chain A|Inositol 2-dehydrogenase/D-chiro-inositol 3-dehydrogenase|Bacillus subtilis (1423)
>5ADU_1|Chains A[auth L], B[auth M]|Hydrogenase-1 large chain|Escherichia coli str. K-12 substr. MC4100 (1403831)
>2IGW_1|Chain A|Peptidyl-prolyl cis-trans isomerase 3|Caenorhabditis elegans (6239)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3NTO , Knot 144 344 0.81 40 199 330
MSLRIGVIGTGAIGKEHINRITNKLSGAEIVAVTDVNQEAAQKVVEQYQLNATVYPNDDSLLADENVDAVLVTSWGPAHESSVLKAIKAQKYVFCEVPLATTAEGCMRIVEEEIKVGKRLVQVGFMRRYDSGYVQLKEALDNHVIGEPLMIHCAHRNPTVGDNYTTDMAVVDTLVHEIDVLHWLVNDDYESVQVIYPKKSKNALPHLKDPQIVVIETKGGIVINAEIYVNCKYGYDIQCEIVGEDGIIKLPEPSSISLRKEGRFSTDILMDWQRRFVAAYDVEIQDFIDSIQKKGEVSGPTAWDGYIAAVTTDACVKAQESGQKEKVELKEKPEFYQSFTTVQN
5ADU , Knot 236 582 0.86 40 290 558
MSTQYETQGYTINNAGRRLVVDPITRIEGHMRCEVNINDQNVITNAVSCGTMFRGLEIILQGRDPRDAWAFVERICGVCTGVHALASVYAIEDAIGIKVPDNANIIRNIMLATLWCHNHLVHFYQLAGMDWIDVLDALKADPRKTSELAQSLSSWPKSSPGYFFDVQNRLKKFVEGGQLGIFRNGYWGHPQYKLPPEANLMGFAHYLEALDFQREIVKIHAVFGGKNPHPNWIVGGMPCAINIDESGAVGAVNMERLNLVQSIITRTADFINNVMIPDALAIGQFNKPWSEIGTGLSDKCVLSYGAFPDIANDFGEKSLLMPGGAVINGDFNNVLPVDLVDPQQVQEFVDHAWYRYPNDQVGRHPFDGITDPWYNPGDVKGSDTNIQQLNEQERYSWIKAPRWRGNAMEVGPLARTLIAYHKGDAATVESVDRMMSALNLPLSGIQSTLGRILCRAHEAQWAAGKLQYFFDKLMTNLKNGNLATASTEKWEPATWPTECRGVGFTEAPRGALGHWAAIRDGKIDLYQCVVPTTWNASPRDPKGQIGAYEAALMNTKMAIPEQPLEILRTLHSFNPCLACSTH
2IGW , Knot 82 173 0.81 40 120 165
MSRSKVFFDITIGGKASGRIVMELYDDVVPKTAGNFRALCTGENGIGKSGKPLHFKGSKFHRIIPNFMIQGGDFTRGNGTGGESIYGEKFPDENFKEKHTGPGVLSMANAGPNTNGSQFFLCTVKTEWLDGKHVVFGRVVEGLDVVKAVESNGSQSGKPVKDCMIADCGQLKA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3NTO_1)}(2) \setminus P_{f(5ADU_1)}(2)|=40\), \(|P_{f(5ADU_1)}(2) \setminus P_{f(3NTO_1)}(2)|=131\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10101111101111000100100010110111100100011001100001010101000011100010111100111100001101101000110011110010101011000101100110111100000101010011000111011110010001011000000111100110010110111000000101101000001110100101111000111110101010000100100011100111011010010100010100011101000111100101001100100010101101101011110001010100010000101000101000100100
Pair \(Z_2\) Length of longest common subsequence
3NTO_1,5ADU_1 171 4
3NTO_1,2IGW_1 173 3
5ADU_1,2IGW_1 222 3

Newick tree

 
[
	2IGW_1:10.75,
	[
		3NTO_1:85.5,5ADU_1:85.5
	]:18.25
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{926 }{\log_{20} 926}-\frac{344}{\log_{20}344})=156.\)
Status Protein1 Protein2 d d1/2
Query variables 3NTO_1 5ADU_1 199 156
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]