CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8OXM_1 3ZEC_1 6HRD_1 Letter Amino acid
1 25 16 T Threonine
0 33 35 A Alanine
0 25 10 I Isoleucine
2 50 34 L Leucine
2 19 15 P Proline
2 18 22 S Serine
0 14 5 N Asparagine
1 14 16 D Aspartic acid
1 30 7 Q Glutamine
0 15 16 H Histidine
0 15 20 R Arginine
2 25 23 E Glutamic acid
0 18 12 K Lycine
0 16 4 Y Tyrosine
0 5 0 W Tryptophan
0 21 38 V Valine
0 6 1 C Cysteine
0 13 24 G Glycine
0 6 8 M Methionine
1 10 11 F Phenylalanine

8OXM_1|Chains A[auth E], C[auth F]|Cellular tumor antigen p53|Homo sapiens (9606)
>3ZEC_1|Chains A, B|ADENOSINE MONOPHOSPHATE-PROTEIN TRANSFERASE SOFIC|SHEWANELLA ONEIDENSIS (70863)
>6HRD_1|Chains A, B, C, D, E, F|3-hydroxybutyryl-CoA dehydrogenase|Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) (83332)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8OXM , Knot 9 12 0.62 16 11 10
EPPLSQETFSDL
3ZEC , Knot 158 378 0.82 40 214 352
MHHHHHHEWQAEQAYNHLPPLPLDSKLAELAETLPILKACIPARAALAELKQAGELLPNQGLLINLLPLLEAQGSSEIGNIVTTTDKLFQYAQEDSQADPMTKEALRYRTALYQCFTQLSNRPLCVTTALEICSTIKSVQMDVRKVPGTSLTNQATGEVIYTPPAGESVIRDLLSNWEAFLHNQDDVDPLIKMAMAHYQFEAIHPFIDGNGRTGRVLNILYLIDQQLLSAPILYLSRYIVAHKQDYYRLLLNVTTQQEWQPWIIFILNAVEQTAKWTTHKIAAARELIAHTTEYVRQQLPKIYSHELVQVIFEQPYCRIQNLVESGLAKRQTASVYLKQLCDIGVLEEVQSGKEKLFVHPKFVTLMTKDSNQFSRYAL
6HRD , Knot 127 317 0.77 38 161 284
MGSSHHHHHHSSGLVPRGSHMSDAIQRVGVVGAGQMGSGIAEVSARAGVEVTVFEPAEALITAGRNRIVKSLERAVSAGKVTERERDRALGLLTFTTDLNDLSDRQLVIEAVVEDEAVKSEIFAELDRVVTDPDAVLASNTSSIPIMKVAAATKQPQRVLGLHFFNPVPVLPLVELVRTLVTDEAAAARTEEFASTVLGKQVVRCSDRSGFVVNALLVPYLLSAIRMVEAGFATVEDVDKAVVAGLSHPMGPLRLSDLVGLDTLKLIADKMFEEFKEPHYGPPPLLLRMVEAGQLGKKSGRGFYTYAAALEHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8OXM_1)}(2) \setminus P_{f(3ZEC_1)}(2)|=3\), \(|P_{f(3ZEC_1)}(2) \setminus P_{f(8OXM_1)}(2)|=206\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:011100001001
Pair \(Z_2\) Length of longest common subsequence
8OXM_1,3ZEC_1 209 3
8OXM_1,6HRD_1 158 3
3ZEC_1,6HRD_1 167 6

Newick tree

 
[
	3ZEC_1:99.23,
	[
		8OXM_1:79,6HRD_1:79
	]:20.23
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{390 }{\log_{20} 390}-\frac{12}{\log_{20}12})=123.\)
Status Protein1 Protein2 d d1/2
Query variables 8OXM_1 3ZEC_1 153 78.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: