CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2ZRV_1 1RQP_1 1DDL_1 Letter Amino acid
34 35 0 A Alanine
32 27 0 G Glycine
39 17 0 I Isoleucine
16 11 0 F Phenylalanine
14 20 0 P Proline
0 3 0 C Cysteine
10 5 0 Q Glutamine
29 6 0 K Lycine
25 17 0 S Serine
8 13 0 Y Tyrosine
14 13 0 D Aspartic acid
32 22 0 E Glutamic acid
3 7 0 H Histidine
26 21 0 L Leucine
12 22 0 T Threonine
4 3 0 W Tryptophan
17 19 0 R Arginine
14 9 0 N Asparagine
12 6 0 M Methionine
27 23 0 V Valine

2ZRV_1|Chains A, B, C, D|Isopentenyl-diphosphate delta-isomerase|Sulfolobus shibatae (2286)
>1RQP_1|Chains A, B, C|5'-fluoro-5'-deoxyadenosine synthase|Streptomyces cattleya (29303)
>1DDL_1|Chain A[auth D]|RNA (5'-R(P*UP*UP*UP*UP*UP*UP*U)-3')|
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2ZRV , Knot 157 368 0.84 38 197 345
MPDIVNRKVEHVEIAAFENVDGLSSSTFLNDVILVHQGFPGISFSEINTKTKFFRKEISVPVMVTGMTGGRNELGRINKIIAEVAEKFGIPMGVGSQRVAIEKAEARESFAIVRKVAPTIPIIANLGMPQLVKGYGLKEFQDAIQMIEADAIAVHLNPAQEVFQPEGEPEYQIYALEKLRDISKELSVPIIVKESGNGISMETAKLLYSYGIKNFDTSGQGGTNWIAIEMIRDIRRGNWKAESAKNFLDWGVPTAASIMEVRYSVPDSFLVGSGGIRSGLDAAKAIALGADIAGMALPVLKSAIEGKESLEQFFRKIIFELKAAMMLTGSKDVDALKKTSIVILGKLKEWAEYRGINLSIYEKVRKRE
1RQP , Knot 130 299 0.82 40 184 289
MAANSTRRPIIAFMSDLGTTDDSVAQCKGLMYSICPDVTVVDVCHSMTPWDVEEGARYIVDLPRFFPEGTVFATTTYPATGTTTRSVAVRIKQAAKGGARGQWAGSGAGFERAEGSYIYIAPNNGLLTTVLEEHGYLEAYEVTSPKVIPEQPEPTFYSREMVAIPSAHLAAGFPLSEVGRPLEDHEIVRFNRPAVEQDGEALVGVVSAIDHPFGNVWTNIHRTDLEKAGIGYGARLRLTLDGVLPFEAPLTPTFADAGEIGNIAIYLNSRGYLSIARNAASLAYPYHLKEGMSARVEAR
1DDL , Knot 2 7 0.18 2 1 1
UUUUUUU

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2ZRV_1)}(2) \setminus P_{f(1RQP_1)}(2)|=82\), \(|P_{f(1RQP_1)}(2) \setminus P_{f(2ZRV_1)}(2)|=69\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11011000100101111001011000011001111001111101001000001100010111110110110001101001110110011111111000111001010001111001110111110111101101011001001101101011110101100110101010001011001001000101111100010110100101100011001000101100111101100100101010010011011110110110100011001111011100110110111111011111111100110100010011001110101111101000101100001111101001100011010100010000
Pair \(Z_2\) Length of longest common subsequence
2ZRV_1,1RQP_1 151 4
2ZRV_1,1DDL_1 198 0
1RQP_1,1DDL_1 185 0

Newick tree

 
[
	1DDL_1:10.67,
	[
		2ZRV_1:75.5,1RQP_1:75.5
	]:26.17
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{667 }{\log_{20} 667}-\frac{299}{\log_{20}299})=102.\)
Status Protein1 Protein2 d d1/2
Query variables 2ZRV_1 1RQP_1 130 117.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]