CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2HKM_1 4HAQ_1 7OFB_1 Letter Amino acid
9 51 16 D Aspartic acid
7 20 21 L Leucine
4 15 4 F Phenylalanine
10 22 16 N Asparagine
15 50 36 G Glycine
4 17 12 I Isoleucine
4 21 17 Y Tyrosine
7 29 18 A Alanine
4 9 22 R Arginine
14 16 11 C Cysteine
2 14 9 Q Glutamine
3 13 3 K Lycine
12 34 23 S Serine
12 39 17 T Threonine
3 9 6 W Tryptophan
5 20 14 E Glutamic acid
7 7 10 H Histidine
1 10 8 M Methionine
7 14 14 P Proline
5 21 27 V Valine

2HKM_1|Chains A[auth D], B[auth H]|Aromatic amine dehydrogenase|Alcaligenes faecalis (511)
>4HAQ_1|Chains A, B|GH7 family protein|Limnoria quadripunctata (161573)
>7OFB_1|Chain A|Kelch-like ECH-associated protein 1|Mus musculus (10090)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2HKM , Knot 62 135 0.75 40 103 129
AGGGGSSSGADHISLNPDLANEDEVNSCDYWRHCAVDGFLCSCCGGTTTTCPPGSTPSPISWIGTCHNPHDGKDYLISYHDCCGKTACGRCQCNTQTRERPGYEFFLHNDVNWCMANENSTFHCTTSVLVGLAKN
4HAQ , Knot 181 431 0.85 40 231 404
QQAGTETEEYHLPLTWERDGSSVSASVVIDSNWRWTHSTEDTTNCYDGNEWDSTLCPDADTCTENCAIDGVDQGTWGDTYGITASGSKLTLSFVTEGEYSTDIGSRVFLMADDDNYEIFNLLDKEFSFDVDASNLPCGLNGALYFVSMDEDGGTSKYSTNTAGAKYGTGYCDAQCPHDMKFIAGKANSDGWTPSDNDQNAGTGEMGACCHEMDIWEANSQAQSYTAHVCSVDGYTPCTGTDCGDNGDDRYKGVCDKDGCDYAAYRLGQHDFYGEGGTVDSGSTLTVITQFITGGGGLNEIRRIYQQGGQTIQNAAVNFPGDVDPYDSITEDFCVDIKRYFGDTNDFDAKGGMSGMSNALKKGMVLVMSLWDDHYANMLWLDATYPVDSTEPGALRGPCSTDSGDPADVEANFPGSTVTFSNIKIGPIQSYD
7OFB , Knot 123 304 0.77 40 177 263
GPKVGRLIYTAGGYFRQSLSYLEAYNPSNGSWLRLADLQVPRSGLAGCVVGGLLYAVGGRNNSPDGNTDSSALDCYNPMTNQWSPCASMSVPRNRIGVGVIDGHIYAVGGSHGCIHHSSVERYEPERDEWHLVAPMLTRRIGVGVAVLNRLLYAVGGFDGTNRLNSAECYYPERNEWRMITPMNTIRSGAGVCVLHNCIYAAGGYDGQDQLNSVERYDVETETWTFVAPMRHHRSALGITVHQGKIYVLGGYDGHTFLDSVECYDPDSDTWSEVTRMTSGRSGVGVAVTMEPCRKQIDQQNCTC

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2HKM_1)}(2) \setminus P_{f(4HAQ_1)}(2)|=33\), \(|P_{f(4HAQ_1)}(2) \setminus P_{f(2HKM_1)}(2)|=161\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111110001100101010110000100000100011011100001100000111001011011100001001000110000001001010000000000011001110001010110000010000011111100
Pair \(Z_2\) Length of longest common subsequence
2HKM_1,4HAQ_1 194 4
2HKM_1,7OFB_1 170 3
4HAQ_1,7OFB_1 186 3

Newick tree

 
[
	4HAQ_1:98.13,
	[
		2HKM_1:85,7OFB_1:85
	]:13.13
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{566 }{\log_{20} 566}-\frac{135}{\log_{20}135})=125.\)
Status Protein1 Protein2 d d1/2
Query variables 2HKM_1 4HAQ_1 159 104
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]