CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6RGT_1 8HUF_1 8WUY_1 Letter Amino acid
17 15 15 D Aspartic acid
20 18 20 A Alanine
20 18 43 L Leucine
5 4 1 M Methionine
25 11 6 T Threonine
11 6 6 H Histidine
18 10 12 I Isoleucine
25 18 13 K Lycine
9 12 19 P Proline
23 4 20 S Serine
7 3 2 W Tryptophan
26 22 12 V Valine
3 3 6 C Cysteine
14 10 2 N Asparagine
12 8 11 Q Glutamine
20 12 15 E Glutamic acid
15 14 14 G Glycine
10 11 13 F Phenylalanine
10 8 4 Y Tyrosine
12 9 14 R Arginine

6RGT_1|Chain A|Uricase|Aspergillus flavus (5059)
>8HUF_1|Chain A|GTP-binding nuclear protein Ran|Homo sapiens (9606)
>8WUY_1|Chains A, B|Nuclear receptor subfamily 4immunitygroup A member 1|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6RGT , Knot 136 302 0.85 40 204 291
MSAVKAARYGKDNVRVYKVHKDEKTGVQTVYEMTVCVLLEGEIETSYTKADNSVIVAADSIKNTIYITAKQNPVTPPELFGSILGTHFIEKYNHIHAAHVNIVCHRWTRMDIDGKPHPHSFIRDSEEKRNVQVDVVEGKGIDIKSSLSGLTVLKSTNSQFWGFLRDEYTTLKETWDRILSTDVDATWQWKNFSGLQEVRSHVPKFDATWATAREVTLKTFAEDNSASVQATMYKMAEQILARQQLIETVEYSLPNKHYFEIDLSWHKGLQNTGKNAEVFAPQSDPNGLIKCTVGRSSLKSKL
8HUF , Knot 102 216 0.84 40 154 210
MAAQGEPQVQFKLVLVGDGGTGKTTFVKRHLTGEFEKKYVATLGVEVHPLVFHTNRGPIKFNVWDTAGLEKFGGLRDGYYIQAQCAIIMFDVTSRVTYKNVPNWHRDLVRVCENIPIVLCGNKVDIKDRKVKAKSIVFHRKKNLQYYDISAKSNYNFEKPFLWLARKLIGDPNLEFVAMPAAAPPEVVMDPALAAQYEHDLEVAQTTALPDEDDDL
8WUY , Knot 109 248 0.80 40 152 235
SKPKQPPDASPANLLTSLVRAHLDSGPSTAKLDYSKFQELVLPHFGKEDAGDVQQFYDLLSGSLEVIRKWAEKIPGFAELSPADQDLLLESAFLELFILRLAYRSKPGEGKLIFCSGLVLHRLQCARGFGDWIDSILAFSRSLHSLLVDVPAFACLSALVLITDRHGLQEPRRVEELQNRIASCLKEHVAAVAGEPQPASCLSRLLGKLPELRTLCTQGLQRIFYLKLEDLVPPPPIIDKIFMDTLPF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6RGT_1)}(2) \setminus P_{f(8HUF_1)}(2)|=112\), \(|P_{f(8HUF_1)}(2) \setminus P_{f(6RGT_1)}(2)|=62\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10110110010001010010000001100100101011101010000001000111110010001010100011011011101110011000001011010110001001010101010011000000001010110101101000101101100000011111000000100010011000101010100101100100011010101101001010011000010101010011001110001100100011000010101010011000100101111000101110001100010001
Pair \(Z_2\) Length of longest common subsequence
6RGT_1,8HUF_1 174 3
6RGT_1,8WUY_1 180 4
8HUF_1,8WUY_1 166 4

Newick tree

 
[
	6RGT_1:90.27,
	[
		8HUF_1:83,8WUY_1:83
	]:7.27
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{518 }{\log_{20} 518}-\frac{216}{\log_{20}216})=86.9\)
Status Protein1 Protein2 d d1/2
Query variables 6RGT_1 8HUF_1 112 94.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]