CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5OWO_1 1TBC_1 2JNG_1 Letter Amino acid
16 1 7 A Alanine
6 3 3 N Asparagine
5 4 3 H Histidine
4 0 5 M Methionine
19 5 7 S Serine
14 8 0 K Lycine
4 2 5 Y Tyrosine
7 8 8 R Arginine
0 7 0 C Cysteine
7 8 4 Q Glutamine
15 8 10 G Glycine
10 4 3 I Isoleucine
5 4 4 T Threonine
13 4 7 D Aspartic acid
23 2 12 E Glutamic acid
22 2 7 L Leucine
6 1 4 F Phenylalanine
9 12 4 P Proline
0 1 3 W Tryptophan
16 2 9 V Valine

5OWO_1|Chains A, B, C, D|Cytoplasmic dynein 1 heavy chain 1|Homo sapiens (9606)
>1TBC_1|Chain A|TAT PROTEIN|Human immunodeficiency virus 1 (11676)
>2JNG_1|Chain A|Cullin-7|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5OWO , Knot 92 201 0.81 36 133 190
MSEPGGGGGEDGSAGLEVSAVQNVADVSVLQKHLRKLVPLLLEDGGEAPAALEAALEEKSALEQMRKFLSDPQVHTVLVERSTLKEDVGDEGEEEKEFISYNINIDIHYGVKSNSLAFIKRTPVIDADKPVSSQLRVLTLSEDSPYETLHSFISNAVAPFFKSYIRESGKADRDGDKMAPSVEKKIAELEMGLLHLQQNIE
1TBC , Knot 46 86 0.79 38 71 83
LDPVDPNIEPWNHPGSQPKTACNRCHCKKCCYHCQVCFITKGLGISYGRKKRRQRRRPSQGGQTHQDPIPKQPSSQPRGDPTGPKE
2JNG , Knot 57 105 0.84 36 89 103
GSHMRSEFASGNTYALYVRDTLQPGMRVRMLDDYEEISAGDEGEFRQSNNGVPPVQVFWESTGRTYWVHWHMLEILGFEEDIEDMVEADEYQGAVASRVLGRALP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5OWO_1)}(2) \setminus P_{f(1TBC_1)}(2)|=111\), \(|P_{f(1TBC_1)}(2) \setminus P_{f(5OWO_1)}(2)|=49\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100111111001011101011001101011000100111111001101111101110000110010011001010011100001000110010000011000101010011000011110001110100110001011010000100010011001111110001000101000100111010001101011110100010
Pair \(Z_2\) Length of longest common subsequence
5OWO_1,1TBC_1 160 3
5OWO_1,2JNG_1 140 5
1TBC_1,2JNG_1 132 2

Newick tree

 
[
	5OWO_1:77.98,
	[
		2JNG_1:66,1TBC_1:66
	]:11.98
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{287 }{\log_{20} 287}-\frac{86}{\log_{20}86})=63.9\)
Status Protein1 Protein2 d d1/2
Query variables 5OWO_1 1TBC_1 85 60
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]