CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1NIE_1 7OKN_1 3KMX_1 Letter Amino acid
24 32 29 L Leucine
9 15 8 M Methionine
13 14 20 F Phenylalanine
4 5 7 W Tryptophan
33 51 24 A Alanine
7 29 16 Q Glutamine
21 25 23 E Glutamic acid
34 49 35 G Glycine
28 25 23 T Threonine
13 4 19 Y Tyrosine
34 33 34 V Valine
14 17 15 N Asparagine
1 2 6 C Cysteine
17 29 15 K Lycine
6 32 30 S Serine
12 20 19 R Arginine
20 25 21 D Aspartic acid
23 31 20 P Proline
15 3 7 H Histidine
12 20 24 I Isoleucine

1NIE_1|Chain A|NITRITE REDUCTASE|Achromobacter cycloclastes (223)
>7OKN_1|Chains A, AA[auth C], CA[auth G], C[auth E], EA[auth K], E[auth I], GA[auth O], G[auth M], I[auth Q], K[auth S], M[auth U], O[auth W], Q[auth Y], S[auth a], U[auth c], W[auth e], Y[auth g]|TraB|Salmonella enterica (28901)
>3KMX_1|Chains A, B|Beta-secretase 1|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1NIE , Knot 140 340 0.80 40 195 317
AAGAAPVDISTLPRVKVDLVKPPFVHAHDQVAKTGPRVVEFTMTIEEKKLVIDREGTEIHAMTFNGSVPGPLMVVHENDYVELRLINPDTNTLLHNIDFHAATGALGGGALTQVNPGEETTLRFKATKPGVFVYHCAPEGMVPWHVTSGMNGAIMVLPRDGLKDEKGQPLTYDKIYYVGEQDFYVPKDEAGNYKKYETPGEAYEDAVKAMRTLTPTHIVFNGAVGALTGDHALTAAVGERVLVVHSQANRDTRPHLIGGHGDYVWATGKFRNPPDLDQETWLIPGGTAGAAFYTFRQPGVYAYVNHNLIEAFELGAAGHFKVTGEWNDDLMTSVVKPASM
7OKN , Knot 185 461 0.82 40 232 426
MANVNKVVRRRQVALLIALVLGIGAGGAGTWMVSEMNLKKAPPAKAPKGEPAPDMTGVVNQSFDNKVQRSAIAEAQRLNKETQTEIKKLRTEMGLVSRDLKGSQDRIRELEDQNQLLQTQLEAGKNFDSLSAEPLPGALASQGKPAPAGNVPPPTSFWPAGGGQAPAAPVMTPIQRPGMMDSQEFSLPDTGPKKPRFPWISSGSFVEAIVVEGADANASVTGDKNTAPMQLRLTGKVQMPNDEEFDLTGCFVTLEAWGDVSSERAIVRSRSISCKLGDDDIDQKIAGHVSFMGKNGIKGEVVMRNGQILLYAGGAGFLDGIGKGIEKASSTTVGVGATASMSAADIGQAGLGGGVSSAAKTLSDYYIKRAEQYHPVIPIGAGNEVTLVFQDGFQLETLEEARAKAAARKKQNQPSASSTPAAMPGNTPDMLKQLQDFRVGDTVDPATGQVVTQWSHPQFEK
3KMX , Knot 169 395 0.85 40 228 379
EPGRRGSFVEMVDNLRGKSGQGYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQRQLSSTYRDLRKGVYVPYTQGKWEGELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNLFSLQLCGAGFPLNQSEVLASVGGSMIIGGIDHSLYTGSLWYTPIRREWYYEVIIVRVEINGQDLKMDCKEYNYDKSIVDSGTTNLRLPKKVFEAAVKSIKAASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLMGEVTNQSFRITILPQQYLRPVEDVATSQDDCYKFAISQSSTGTVMGAVIMEGFYVVFDRARKRIGFAVSACHVHDEFRTAAVEGPFVTLDMEDCGYNI

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1NIE_1)}(2) \setminus P_{f(7OKN_1)}(2)|=61\), \(|P_{f(7OKN_1)}(2) \setminus P_{f(1NIE_1)}(2)|=98\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1111111010011010101101111010001100110110101010000111000100101101010111111110000010101101000011001010110111111110010110000101010011111000110111110100110111111100110000101100001001100010110001100000001101000110110010100111011111101001101111001111000100000101111010011101010011010000111111011111001001110101000110110111110101010100011001101101
Pair \(Z_2\) Length of longest common subsequence
1NIE_1,7OKN_1 159 4
1NIE_1,3KMX_1 169 4
7OKN_1,3KMX_1 148 5

Newick tree

 
[
	1NIE_1:84.54,
	[
		7OKN_1:74,3KMX_1:74
	]:10.54
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{801 }{\log_{20} 801}-\frac{340}{\log_{20}340})=125.\)
Status Protein1 Protein2 d d1/2
Query variables 1NIE_1 7OKN_1 157 134.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]