CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7CYF_1 2FFG_1 1VAP_1 Letter Amino acid
1 1 14 C Cysteine
10 5 15 K Lycine
12 4 3 M Methionine
44 3 5 A Alanine
10 2 8 D Aspartic acid
11 7 3 Q Glutamine
16 11 5 E Glutamic acid
2 6 1 H Histidine
45 7 4 L Leucine
25 4 4 V Valine
8 5 7 N Asparagine
37 6 5 I Isoleucine
18 6 4 F Phenylalanine
12 3 9 Y Tyrosine
12 2 4 R Arginine
36 3 12 G Glycine
20 3 5 P Proline
29 4 5 S Serine
22 5 7 T Threonine
4 0 3 W Tryptophan

7CYF_1|Chains A, B, C|Slr1512 protein|Synechocystis sp. PCC 6803 substr. Kazusa (1111708)
>2FFG_1|Chains A, B|ykuJ|Bacillus subtilis (1423)
>1VAP_1|Chains A, B|PHOSPHOLIPASE A2|Agkistrodon piscivorus piscivorus (8716)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7CYF , Knot 154 374 0.81 40 191 343
MDFLSNFLTDFVGQLQSPTLAFLIGGMVIAALGTQLVIPEAISTIIVFMLLTKIGLTGGMAIRNSNLTEMLLPVAFSVILGILIVFIARFTLAKLPNVRTVDALATGGLFGAVSGSTMAAALTTLEESKISYEAWAGALYPFMDIPALVTAIVVANIYLNKRKRKSAAASIEESFSKQPVAAGDYGDQTDYPRTRQEYLSQQEPEDNRVKIWPIIEESLQGPALSAMLLGLALGIFTKPESVYEGFYDPLFRGLLSILMLIMGMEAWSRIGELRKVAQWYVVYSLIAPIVHGFIAFGLGMIAHYATGFSLGGVVVLAVIAASSSDISGPPTLRAGIPSANPSAYIGSSTAIGTPIAIGVCIPLFIGLAQTLGAG
2FFG , Knot 44 87 0.75 38 71 79
MSQLMGIITRLQSLQETAEAANEPMQRYFEVNGEKICSVKYFEKNQTFELTVFQKGEKPNTYPFDNIDMVSIEIFELLQLEHHHHHH
1VAP , Knot 63 123 0.82 40 96 120
NLFQFEKLIKKMTGKSGMLWYSAYGCYCGWGGQGRPKDATDRCCFVHDCCYGKVTGCNPKMDIYTYSVDNGNIVCGGTNPCKKQICECDRAAAICFRDNLKTYDSKTYWKYPKKNCKEESEPC

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7CYF_1)}(2) \setminus P_{f(2FFG_1)}(2)|=153\), \(|P_{f(2FFG_1)}(2) \setminus P_{f(7CYF_1)}(2)|=33\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10110011001110100101111111111111100111101100111111100111011111000010011111110111111111110101101101001011101111111010011111001000010001111110111011111011111010100000001110100010001111100100000100000010000100001011111000101111011111111111001001001100111011101111111101100110100110101100111111011111111111001011011111111111100001011101011110101010110001110111111011111111001111
Pair \(Z_2\) Length of longest common subsequence
7CYF_1,2FFG_1 186 3
7CYF_1,1VAP_1 195 3
2FFG_1,1VAP_1 139 3

Newick tree

 
[
	7CYF_1:10.43,
	[
		2FFG_1:69.5,1VAP_1:69.5
	]:32.93
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{461 }{\log_{20} 461}-\frac{87}{\log_{20}87})=113.\)
Status Protein1 Protein2 d d1/2
Query variables 7CYF_1 2FFG_1 139 85.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]