CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7ZDR_1 6OBO_1 7ZSV_1 Letter Amino acid
7 1 2 W Tryptophan
30 22 4 I Isoleucine
99 22 13 L Leucine
10 1 10 K Lycine
13 3 3 M Methionine
36 17 3 T Threonine
21 12 8 F Phenylalanine
34 17 7 S Serine
11 17 5 N Asparagine
20 8 9 D Aspartic acid
38 13 6 Q Glutamine
27 15 9 E Glutamic acid
47 17 23 G Glycine
61 24 11 A Alanine
40 21 8 R Arginine
2 2 2 C Cysteine
35 15 15 V Valine
10 4 6 H Histidine
18 12 7 P Proline
14 14 7 Y Tyrosine

7ZDR_1|Chain A[auth C]|ATP-binding/permease protein CydC|Escherichia coli K-12 (83333)
>6OBO_1|Chains A, B|Ricin A chain|Ricinus communis (3988)
>7ZSV_1|Chains A, B|Cytochrome c|Methylococcus capsulatus str. Bath (243233)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7ZDR , Knot 216 573 0.79 40 227 510
MRALLPYLALYKRHKWMLSLGIVLAIVTLLASIGLLTLSGWFLSASAVAGVAGLYSFNYMLPAAGVRGAAITRTAGRYFERLVSADATFRVLQHLRIYTFSKLLPLSPAGLARYRQGELLNRVVADVDTLDHLYLRVISPLVGAFVVIMVVTIGLSFLDFTLAFTLGGIMLLTLFLMPPLFYRAGKSTGQNLTHLRGQYRQQLTAWLQGQAELTIFGASDRYRTQLENTEIQWLEAQRRQSELTALSQAIMLLIGALAVILMLWMASGGVGGNAQPGALIALFVFCALAAFEALAPVTGAFQHLGQVIASAVRISDLTDQKPEVTFPDTQTRVADRVSLTLRDVQFTYPEQSQQALKGISLQVNAGEHIAILGRTGCGKSTLLQQLTRAWDPQQGEILLNDSPIASLNEAALRQTISVVPQRVHLFSATLRDNLLLASPGSSDEALSEILRRVGLEKLLEDAGLNSWLGEGGRQLSGGELRRLAIARALLHDAPLVLLDEPTEGLDATTESQILELLAEMMREKTVLMVTHRLRGLSRFQQIIVMDNGQIIEQGTHAELLARQGRYYQFKQGL
6OBO , Knot 113 257 0.81 40 163 249
QYPIINFTTAGATVQSYTNFIRAVRGRLTTGADVRHEIPVLPNRVGLPINQRFILVELSNHAELSVTLALDVTNAYVVGYRAGNSAYFFHPDNQEDAEAITHLFTDVQNRYTFAFGGNYDRLEQLAGNLRENIELGNGPLEEAISALYYYSTGGTQLPTLARSFIICIQMISEAARFQYIEGEMRTRIRYNRRSAPDPSVITLENSWGRLSTAIQESNQGAFASPIQLQRRNGSKFSVYDVSILIPIIALMVYRCAP
7ZSV , Knot 74 158 0.79 40 117 152
MNKPSFLLVGLLVVSGVLGAAETKVKYPDGFRSWYHVKSMVIQPGHPLENPVGGIHHVYANAEAIQGLRGGNYPDGAVLVFDLFDYQEDNHALVEGKRKLIGVMERDAKRFSATGGWGYEGFGEGKPDKRLVTDGGQGCFGCHAAQKESQYVFSRLRD

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7ZDR_1)}(2) \setminus P_{f(6OBO_1)}(2)|=102\), \(|P_{f(6OBO_1)}(2) \setminus P_{f(7ZDR_1)}(2)|=38\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101111011100000111011111111011101111010111101011111111001001111111011110001100100110101010110010100100111101111100001011001110100100101011011111111111101110110101110111111101111111100110001001001010000010111010101011110000000100001011010000001011001111111111111111110111110101111111111011111011111011100110111011010010000101011000001100101010010100100000110110101011001111100101000110010011010010111000111010011100010111001011010100011110110000110011001110011001110011101100101101001111011100111111001001101000001101110110000111100010110010011110010110010010111001000010011
Pair \(Z_2\) Length of longest common subsequence
7ZDR_1,6OBO_1 140 4
7ZDR_1,7ZSV_1 188 4
6OBO_1,7ZSV_1 158 4

Newick tree

 
[
	7ZSV_1:91.74,
	[
		7ZDR_1:70,6OBO_1:70
	]:21.74
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{830 }{\log_{20} 830}-\frac{257}{\log_{20}257})=157.\)
Status Protein1 Protein2 d d1/2
Query variables 7ZDR_1 6OBO_1 187 136
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]