CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7OUZ_1 2GRH_1 6NDA_1 Letter Amino acid
25 18 9 A Alanine
14 4 8 R Arginine
4 1 8 C Cysteine
23 10 8 G Glycine
14 15 7 K Lycine
2 2 7 W Tryptophan
6 9 7 N Asparagine
12 13 8 D Aspartic acid
18 7 11 S Serine
9 6 5 T Threonine
18 14 6 V Valine
8 6 9 Q Glutamine
16 8 5 I Isoleucine
32 13 6 L Leucine
9 6 7 F Phenylalanine
19 3 10 E Glutamic acid
7 2 2 H Histidine
8 2 2 M Methionine
8 3 5 P Proline
5 4 5 Y Tyrosine

7OUZ_1|Chain A|Uridine 5'-monophosphate synthase|Homo sapiens (9606)
>2GRH_1|Chains A, B|Globin-1|Scapharca inaequivalvis (2784303)
>6NDA_1|Chains A, D, G, J, M, P|Snaclec rhodocetin subunit gamma|Calloselasma rhodostoma (8717)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7OUZ , Knot 116 257 0.83 40 173 250
MELSFGARAELPRIHPVASKLLRLMQKKETNLCLSADVSLARELLQLADALGPSICMLKTHVDILNDFTLDVMKELITLAKCHEFLIFEDRKFADIGNTVKKQYEGGIFKIASWADLVNAHVVPGSGVVKGLQEVGLPLHRGCLLIAEMSSTGSLATGDYTRAAVRMAEEHSEFVVGFISGSRVSMKPEFLHLTPGVQLEAGGDNLGQQYNSPQEVIGKRGSDIIIVGRGIISAADRLEAAEMYRKAAWEAYLSRLG
2GRH , Knot 73 146 0.83 40 112 144
PSVYDAAAQLTADVKKDLRDSWKVIGSDKKGNGVALVTTLFADNQETIGYFKRLGDVSQGMANDKLRGHSITLMYALQNFIDQLDNPDDLVCVVEKFAVNHITRKISAAEFGKINGPIKKVLASKNFGDKYANAWAKLVAVVQAAL
6NDA , Knot 68 135 0.82 40 111 132
DFNCLPGWSAYDQHCYQAFNEPKTWDEAERFCTEQAKRGHLVSIGSDGEADFVAQLVTNNIKRPELYVWIGLRDRRKEQQCSSEWSMSASIIYVNWNTGESQMCQGLARWTGFRKWDYSDCQAKNPFVCKFPSEC

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7OUZ_1)}(2) \setminus P_{f(2GRH_1)}(2)|=106\), \(|P_{f(2GRH_1)}(2) \setminus P_{f(7OUZ_1)}(2)|=45\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10101110101101011100110110000001010101011001101101111010110001011001010110011011000011110000110110010000011110110110110101111011101100111110010111101000101101000011101100000111111010010101011010111010111001100000100111001001111101110110010110100011101010011
Pair \(Z_2\) Length of longest common subsequence
7OUZ_1,2GRH_1 151 4
7OUZ_1,6NDA_1 190 3
2GRH_1,6NDA_1 155 4

Newick tree

 
[
	6NDA_1:90.11,
	[
		7OUZ_1:75.5,2GRH_1:75.5
	]:14.61
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{403 }{\log_{20} 403}-\frac{146}{\log_{20}146})=77.1\)
Status Protein1 Protein2 d d1/2
Query variables 7OUZ_1 2GRH_1 96 73
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]