CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1BJK_1 1UIU_1 4AUK_1 Letter Amino acid
22 28 20 K Lycine
6 10 11 M Methionine
12 22 27 R Arginine
16 28 18 D Aspartic acid
22 32 16 I Isoleucine
7 21 17 F Phenylalanine
18 38 20 P Proline
12 26 14 S Serine
18 25 31 V Valine
11 23 15 N Asparagine
24 32 28 E Glutamic acid
27 25 26 G Glycine
22 47 30 L Leucine
17 29 13 T Threonine
18 52 30 A Alanine
3 0 9 C Cysteine
5 9 10 W Tryptophan
15 19 13 Y Tyrosine
13 26 14 Q Glutamine
7 10 13 H Histidine

1BJK_1|Chain A|FERREDOXIN--NADP+ REDUCTASE|Nostoc sp. (1168)
>1UIU_1|Chains A, B|Nickel-binding periplasmic protein|Escherichia coli (562)
>4AUK_1|Chains A, B|RIBOSOMAL RNA LARGE SUBUNIT METHYLTRANSFERASE M|ESCHERICHIA COLI (83333)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1BJK , Knot 134 295 0.86 40 197 289
DVPVNLYRPNAPFIGKVISNEPLVKEGGIGIVQHIKFDLTGGNLKYIEGQSIGIIPPGVDKNGKPEKLRLYSIASTRHGDDVDDKTISLCVRQLEYKHPESGETVYGVCSTYLTHIEPGSEVKITGPVGKEMLLPDDPEANVIMLATGTGIAPMRTYLWRMFKDAERAANPEYQFKGFSWLVFGVPTTPNILYKEELEEIQQKYPDNFRLTYAISREQKNPQGGRMYIQDRVAEHADQLWQLIKNQKTHTYICGLEGMEEGIDAALSAAAAKEGVTWSDYQKDLKKAGRWHVETY
1UIU , Knot 202 502 0.83 38 245 473
AAPDEITTAWPVNVGPLNPHLYTPNQMFAQSMVYEPLVKYQADGSVIPWLAKSWTHSEDGKTWTFTLRDDVKFSNGEPFDAEAAAENFRAVLDNRQRHAWLELANQIVDVKALSKTELQITLKSAYYPFLQELALPRPFRFIAPSQFKNHETMNGIKAPIGTGPWILQESKLNQYDVFVRNENYWGEKPAIKKITFNVIPDPTTRAVAFETGDIDLLYGNEGLLPLDTFARFSQNPAYHTQLSQPIETVMLALNTAKAPTNELAVREALNYAVNKKSLIDNALYGTQQVADTLFAPSVPYANLGLKPSQYDPQKAKALLEKAGWTLPAGKDIREKNGQPLRIELSFIGTDALSKSMAEIIRADMRQIGADVSLIGEEESSIYARQRDGRFGMIFHRTWGAPYDPHAFLSSMRVPSHADFQAQQGLADKPLIDKEIGEVLATHDETQRQALYRDILTRLHDEAVYLPISYISMMVVSKPELGNIPYAPIATEIPFEQIKPVKP
4AUK , Knot 162 375 0.85 40 223 354
MNKVVLLCRPGFEKECAAEITDKAGQREIFGFARVKENAGYVIYECYQPDDGDKLIRELPFSSLIFARQWFVVGELLQHLPPEDRITPIVGMLQGVVEKGGELRVEVADTNESKELLKFCRKFTVPLRAALRDAGVLANYETPKRPVVHVFFIAPGCCYTGYSYSNNNSPFYMGIPRLKFPADAPSRSTLKLEEAFHVFIPADEWDERLANGMWAVDLGACPGGWTYQLVKRNMWVYSVDNGPMAQSLMDTGQVTWLREDGFKFRPTRSNISWMVCDMVEKPAKVAALMAQWLVNGWCRETIFNLKLPMKKRYEEVSHNLAYIQAQLDEHGINAQIQARQLYHDREEVTVHVRRIWAAVGGRRDERSKGHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1BJK_1)}(2) \setminus P_{f(1UIU_1)}(2)|=51\), \(|P_{f(1UIU_1)}(2) \setminus P_{f(1BJK_1)}(2)|=99\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0111010010111110110001110011111100101010110100101001111111100010100101001100001001000010101001000010010010110000100101100101011110011110010101111101011111000110110010011010001011011111110010110000100100001001010011000000101101010001100100110110000000010110110011011101111001101000000100110101000
Pair \(Z_2\) Length of longest common subsequence
1BJK_1,1UIU_1 150 5
1BJK_1,4AUK_1 182 3
1UIU_1,4AUK_1 160 4

Newick tree

 
[
	4AUK_1:88.95,
	[
		1BJK_1:75,1UIU_1:75
	]:13.95
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{797 }{\log_{20} 797}-\frac{295}{\log_{20}295})=137.\)
Status Protein1 Protein2 d d1/2
Query variables 1BJK_1 1UIU_1 175 138
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]