CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9AUV_1 3CNR_1 4AWG_1 Letter Amino acid
19 5 25 E Glutamic acid
18 7 14 K Lycine
1 1 2 W Tryptophan
49 12 12 A Alanine
19 5 15 R Arginine
2 0 4 C Cysteine
27 5 18 I Isoleucine
35 7 9 V Valine
13 3 3 Q Glutamine
12 1 6 H Histidine
16 5 9 M Methionine
14 10 5 P Proline
11 3 7 Y Tyrosine
25 6 12 S Serine
20 9 12 T Threonine
10 3 7 N Asparagine
21 5 11 D Aspartic acid
48 13 10 G Glycine
26 13 10 L Leucine
9 4 13 F Phenylalanine

9AUV_1|Chains A, B, C, D, E, F, G, H|Inosine-5'-monophosphate dehydrogenase|Acinetobacter baumannii (470)
>3CNR_1|Chains A, B|Type IV fimbriae assembly protein|Xanthomonas axonopodis pv. citri (346)
>4AWG_1|Chains A, B, C, D|POLYMERASE PA|INFLUENZA A VIRUS (641501)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9AUV , Knot 158 395 0.79 40 200 366
MHHHHHHGENLYFQGSMLTIVQEALTFDDVLLLPAYSTVLPKDVSLKTRLTRGIYLNIPLVSAAMDTVTESRMAIAMAQNGGIGILHKNMDIAAQAAEVRRVKKFEAGKAESYPNSCKDDLGRLRVGAAVGTGADTPSRVEALVEAGVDVIVVDTAHGHSAGVIERVRWVKQNFPQVQVIGGNIATGDAALALLDAGADAVKVGIGPGSICTTRIVAGIGMPQISAIDSVASALKDQIPLIADGGIRFSGDMAKAIGAGASTIMVGSLLAGTEEAPGEVEFFQGRYYKAYRGMGSLGAMAGATGSADRYFQDSKAGAEKLVPEGIEGRVPYKGPMGNIVHQMMGGLRSSMGYTGSAVIEDLRQNAKFVKITSAGMSESHVHDVTITKEAPNYRVG
3CNR , Knot 60 117 0.81 38 92 114
MSAMNARQGILSLALKDKPALYSAYMPFVKGGGIFVPTPKRYMLGDEVFLLLTLPDSSERLPVAGKVIWTTPAGAQGNRAAGIGVQFPDGPEGEAVRNKIETLLAGLTTSDKPTHTM
4AWG , Knot 97 204 0.84 40 145 197
MGSGMAMEDFVRQCFNPMIVELAEKAMKEYGEDPKIETNKFAAICTHLEVCFMYSDFHFIDERGESIIVESGDPNALLKHRFEIIEGRDRIMAWTVVNSICNTTGVEKPKFLPDLYDYKENRFIEIGVTRREVHIYYLEKANKIKSEKTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMASRSLWDSFRQSERGE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9AUV_1)}(2) \setminus P_{f(3CNR_1)}(2)|=136\), \(|P_{f(3CNR_1)}(2) \setminus P_{f(9AUV_1)}(2)|=28\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10000001001010101101100110100111111000111001010001001101011110111001000011111100111111000101110110100100101101000100000011010111111011001001011101110111100101001111001011000110101111011010111111011101101111110100001111111101011001101100011111011101010110111111001111011110001110101101000010011101111111010100010000111001110110101100111101100111110001100101110010001011010011100001001010001100011
Pair \(Z_2\) Length of longest common subsequence
9AUV_1,3CNR_1 164 4
9AUV_1,4AWG_1 177 4
3CNR_1,4AWG_1 153 4

Newick tree

 
[
	9AUV_1:88.05,
	[
		3CNR_1:76.5,4AWG_1:76.5
	]:11.55
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{512 }{\log_{20} 512}-\frac{117}{\log_{20}117})=117.\)
Status Protein1 Protein2 d d1/2
Query variables 9AUV_1 3CNR_1 139 89
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]