CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8WKI_1 4OVX_1 7BDT_1 Letter Amino acid
24 6 9 Q Glutamine
27 18 24 L Leucine
9 8 9 M Methionine
2 3 2 W Tryptophan
6 11 13 R Arginine
46 9 8 N Asparagine
20 21 15 D Aspartic acid
16 13 5 F Phenylalanine
29 15 12 V Valine
41 38 27 A Alanine
0 3 2 C Cysteine
9 20 33 E Glutamic acid
17 13 10 I Isoleucine
44 7 10 T Threonine
11 11 10 Y Tyrosine
39 22 16 G Glycine
2 9 3 H Histidine
15 28 17 K Lycine
12 5 7 P Proline
34 17 20 S Serine

8WKI_1|Chains AA[auth ZG], A[auth ZF], BA[auth ZI], B[auth ZH], CA[auth ZK], C[auth ZJ], D[auth ZL], E[auth ZM], F[auth ZN], G[auth ZO], H[auth ZP], I[auth ZQ], J[auth ZR], K[auth ZS], L[auth ZT], M[auth ZU], N[auth ZV], O[auth ZW], P[auth ZX], Q[auth ZY], R[auth ZZ], S[auth Za], T[auth Zb], U[auth Zc], V[auth Zd], W[auth Ze], X[auth Zf], Y[auth Zg], Z[auth Zh]|Flagellar hook protein FlgE|Salmonella enterica subsp. enterica serovar Typhimurium str. LT2 (99287)
>4OVX_1|Chain A|Xylose isomerase domain protein TIM barrel|Planctomyces limnophilus (521674)
>7BDT_1|Chain A|14-3-3 protein sigma|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8WKI , Knot 162 403 0.80 38 190 368
MSFSQAVSGLNAAATNLDVIGNNIANSATYGFKSGTASFADMFAGSKVGLGVKVAGITQDFTDGTTTNTGRGLDVAISQNGFFRLVDSNGSVFYSRNGQFKLDENRNLVNMQGMQLTGYPATGTPPTIQQGANPAPITIPNTLMAAKSTTTASMQINLNSTDPVPSKTPFSVSDADSYNKKGTVTVYDSQGNAHDMNVYFVKTKDNEWAVYTHDSSDPAATAPTTASTTLKFNENGILESGGTVNITTGTINGATAATFSLSFLNSMQQNTGANNIVATNQNGYKPGDLVSYQINNDGTVVGNYSNEQEQVLGQIVLANFANNEGLASQGDNVWAATQASGVALLGTAGSGNFGKLTNGALEASNVDLSKELVNMIVAQRNYQSNAQTIKTQDQILNTLVNLR
4OVX , Knot 120 277 0.81 40 175 266
AALQTSASPFEISLAQWSLHKAFFDKKADPMDFAKIAKEEFGINAIEYVNQFYKGKAEDQAFLADLKKRADDHGVKSLLIMCDGEGALGDADEAKRKKAVENHYKWVAAAKYLGCHSIRVNAQSGGSYDEQLARAADGLRRLTEFAATHDINVIVENHGGLSSNGAWLAAVMKKVDHPRCGTLPDFGNFRVSKDEMYDRYKGVEELMPFAKAVSAKSHDFDAAGNEIHTDYRKMMKIVASFGYKGYVGIEYEGSKISEADGIKATKKLLETVRSEMA
7BDT , Knot 115 252 0.84 40 170 243
AMGSMERASLIQKAKLAEQAERYEDMAAFMKGAVEKGEELSCEERNLLSVAYKNVVGGQRAAWRVLSSIEQKSNEEGSEEKGPEVREYREKVETELQGVCDTVLGLLDSHLIKEAGDAESRVFYLKMKGDYYRYLAEVATGDDKKRIIDSARSAYQEAMDISKKEMPPTNPIRLGLALNFSVFHYEIANSPEEAISLAKTTFDEAMADLHTLSEDSYKDSTLIMQLLRDNLTLWTADNAGEEGGEAPQEPQS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8WKI_1)}(2) \setminus P_{f(4OVX_1)}(2)|=93\), \(|P_{f(4OVX_1)}(2) \setminus P_{f(8WKI_1)}(2)|=78\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1010011011011100101110011001001100101011011110011111011110001001000001011011100011101100010110000101010000011010110101011010110100110111101100111100000101010100001110001101001000000101010000101001010110000001110000000111011001000101000111001101010010101101101010110010000110011100001001101100010001011100000000111011110110001110010011110010111111011010110100111010010100011011110000000100100000110011010
Pair \(Z_2\) Length of longest common subsequence
8WKI_1,4OVX_1 171 3
8WKI_1,7BDT_1 168 4
4OVX_1,7BDT_1 135 4

Newick tree

 
[
	8WKI_1:89.77,
	[
		7BDT_1:67.5,4OVX_1:67.5
	]:22.27
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{680 }{\log_{20} 680}-\frac{277}{\log_{20}277})=112.\)
Status Protein1 Protein2 d d1/2
Query variables 8WKI_1 4OVX_1 140 118.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]