CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5CDY_1 1DKF_1 6ABA_1 Letter Amino acid
11 7 17 M Methionine
3 13 6 P Proline
4 4 14 Y Tyrosine
13 12 12 D Aspartic acid
4 5 15 Q Glutamine
18 33 32 L Leucine
10 14 9 K Lycine
7 9 16 F Phenylalanine
31 23 32 A Alanine
11 9 10 N Asparagine
26 11 22 G Glycine
1 7 8 H Histidine
15 14 17 S Serine
1 4 2 C Cysteine
14 20 5 E Glutamic acid
17 9 18 T Threonine
1 2 8 W Tryptophan
21 14 23 V Valine
15 11 6 R Arginine
21 12 24 I Isoleucine

5CDY_1|Chains A, B, C, D|3-oxoacyl-[acyl-carrier protein] reductase|Yersinia pestis (632)
>1DKF_1|Chain A|PROTEIN (RETINOID X RECEPTOR-ALPHA)|Mus musculus (10090)
>6ABA_1|Chain A|Chloride pumping rhodopsin|Nonlabens marinus S1-08 (1454201)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5CDY , Knot 110 244 0.82 40 155 233
MSFEGKIALVTGASRGIGRAIAELLVERGACVIGTATSEKGAEAISAYLGENGKGLMLNVVDPTSIDTVLATIRAEFGEVDILVNNAGITRDNLLMRMKDDEWQDIIDTNLTSVFRLSKAVMRAMMKKRFGRIITIGSVVGTMGNAGQVNYAAAKAGVIGFSKSLAREVASRGITVNVVAPGFIETDMTRTLTDDQRAGILAQVPANRLGDAKEIASAVAFLASDEASYISGETLHVNGGMYMI
1DKF , Knot 109 233 0.85 40 163 221
SANEDMPVEKILEAELAVEPKTETYVEANMGLNPSSPNDPVTNICQAADKQLFTLVEWAKRIPHFSELPLDDQVILLRAGWNELLIASASHRSIAVKDGILLATGLHVHRNSAHSAGVGAIFDRVLTELVSKMRDMQMDKTELGCLRAIVLFNPDSKGLSNPAEVEALREKVYASLEAYCKHKYPEQPGRFAKLLLRLPALRSIGLKCLEHLFFFKLIGDTPIDTFLMEMLEA
6ABA , Knot 129 296 0.82 40 175 283
MASMTGGQQMGRDPNSMKNIESLFDYSAGQFEFIDHLLTMGVGVHFAALIFFLVVSQFVAPKYRIATALSCIVMVSAGLILNSQAVMWTDAYAYVDGSYQLQDLTFSNGYRYVNWMATIPCLLLQLLIVLNLKGKELFSTATWLILAAWGMIITGYVGQLYEVDDIAQLMIWGAVSTAFFVVMNWIVGTKIFKNRATMLGGTDSTITKVFWLMMFAWTLYPIAYLVPAFMNNADGVVLRQLLFTIADISSKVIYGLMITYIAIQQSAAAGYVPAQQALGRIGMDSKAALEHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5CDY_1)}(2) \setminus P_{f(1DKF_1)}(2)|=81\), \(|P_{f(1DKF_1)}(2) \setminus P_{f(5CDY_1)}(2)|=89\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1010101111011001110111011100110111010000110110101100101111011010010011101010110101110011100001110100001001100010011010011101110001101101101110110110100111011111100011001100110101111111000100010000011111011100110100110111111000100101001010111011
Pair \(Z_2\) Length of longest common subsequence
5CDY_1,1DKF_1 170 4
5CDY_1,6ABA_1 148 3
1DKF_1,6ABA_1 166 3

Newick tree

 
[
	1DKF_1:87.08,
	[
		5CDY_1:74,6ABA_1:74
	]:13.08
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{477 }{\log_{20} 477}-\frac{233}{\log_{20}233})=70.4\)
Status Protein1 Protein2 d d1/2
Query variables 5CDY_1 1DKF_1 88 87
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]