CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1ZBA_1 8RIC_1 6WJI_1 Letter Amino acid
9 25 9 Q Glutamine
11 32 6 I Isoleucine
5 20 7 F Phenylalanine
12 32 7 P Proline
13 47 7 S Serine
24 65 13 A Alanine
11 37 7 D Aspartic acid
1 1 0 C Cysteine
15 40 3 V Valine
21 42 11 T Threonine
1 12 2 W Tryptophan
10 18 4 Y Tyrosine
9 36 5 N Asparagine
8 11 2 H Histidine
3 12 2 M Methionine
13 13 6 R Arginine
9 23 12 K Lycine
6 17 4 E Glutamic acid
12 61 9 G Glycine
19 47 5 L Leucine

1ZBA_1|Chain A[auth 1]|Coat protein VP1|Foot-and-mouth disease virus (12110)
>8RIC_1|Chains A, B|Glucose oxidase|Trametes cinnabarina (5643)
>6WJI_1|Chains A, B, C, D, E, F|Nucleoprotein|Severe acute respiratory syndrome coronavirus 2 (2697049)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1ZBA , Knot 101 212 0.85 40 155 207
TTTTGESADPVTTTVENYGGDTQVQRRHHTDVGFIMDRFVKINSLSPTHVIDLMQTHKHGIVGALLRAATYYFSDLEIVVRHDGNLTWVPNGAPEAALSNTSNPTAYNKAPFTRLALPYTAPHRVLATVYDGTNKYSASDSRSGDLGSIAARVATQLPASFNYGAIQAQAIHELLVRMKRAELYCPRPLLAIKVTSQDRYKQKIIAPAKQLL
8RIC , Knot 236 591 0.85 40 251 543
ASSGITSDPTVVNGQTYDYIVVGGGLTGTTVAARLAENSSLQILMIEAGGDDRTNPQIYDIYEYGAVFNGPLDWAWEADQGKVIHGGKTLGGSSSINGAAWTRGLNAQYDSWSSLLEPEEASVGWNWNNLFGYMKKAEAFSAPNDQQRAKGADSIASYHGTTGPVQATFPDEMYGGPQMPAFVNTVVNVTGMPHYKDLNGGTPNCVSITPLSINWHDDDHRSSSIEAYYTPVENNRQGWTLLIDHMATKVLFDGTNAPLTAVGIEFGASDATGNRYKAFARKEVILAAGAIQTPALLQLSGIGDSDVLGPLGISTLSDLKTVGKNLQEQTQNAIGAKGNGFDPDGHGPTDAIAFPNIYQVFGSQATSAVQTIQSSLSAWAKTQAAAGALSADALNTIYQTQADLIINHNAPVVELFFDSGFPDDVGIVMWPLLPFSRGNVTITSNNPFAKPSVNVNYFSVDFDLTMHIAGARLSRKLLGSPPLSSLLVGETVPGFKTVPNNGNGGTDADWKKWILKPGNSAGFASVAHPIGTAAMMKRSLGGVVDAQLKVYDTTNLRVVDASMMPLQISAHLSSTLYGVAEKAADLIKAAQ
6WJI , Knot 62 121 0.82 38 98 118
SNATKKSAAEASKKPRQKRTATKAYNVTQAFGRRGPEQTQGNFGDQELIRQGTDYKHWPQIAQFAPSASAFFGMSRIGMEVTPSGTWLTYTGAIKLDDKDPNFKDQVILLNKHIDAYKTFP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1ZBA_1)}(2) \setminus P_{f(8RIC_1)}(2)|=36\), \(|P_{f(8RIC_1)}(2) \setminus P_{f(1ZBA_1)}(2)|=132\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00001001011000100011000100000001111100110100101001101100000111111101100010010111000101011101110111000001010001110011110011001110100100000100000101101110110011101001110101100111010010100101111101000000000111110011
Pair \(Z_2\) Length of longest common subsequence
1ZBA_1,8RIC_1 168 4
1ZBA_1,6WJI_1 159 3
8RIC_1,6WJI_1 191 4

Newick tree

 
[
	8RIC_1:93.15,
	[
		1ZBA_1:79.5,6WJI_1:79.5
	]:13.65
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{803 }{\log_{20} 803}-\frac{212}{\log_{20}212})=163.\)
Status Protein1 Protein2 d d1/2
Query variables 1ZBA_1 8RIC_1 202 137
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]