CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2VIX_1 5SZX_1 3PQI_1 Letter Amino acid
12 2 11 A Alanine
27 0 10 E Glutamic acid
6 0 6 M Methionine
11 7 29 T Threonine
1 0 0 W Tryptophan
33 0 20 S Serine
9 0 6 Y Tyrosine
15 0 21 V Valine
15 0 7 R Arginine
14 0 14 N Asparagine
11 0 6 Q Glutamine
7 0 5 H Histidine
36 0 12 L Leucine
17 0 17 D Aspartic acid
2 6 2 C Cysteine
7 0 13 P Proline
12 3 32 G Glycine
26 0 18 I Isoleucine
20 0 10 K Lycine
13 0 8 F Phenylalanine

2VIX_1|Chains A, B, C|PROTEIN MXIC|SHIGELLA FLEXNERI (623)
>5SZX_1|Chain A[auth C]|DNA (5'-D(*TP*CP*TP*TP*CP*AP*TP*(5CM)P*GP*CP*TP*CP*AP*GP*TP*GP*CP*T)-3')|synthetic construct (32630)
>3PQI_1|Chain A|gene product 138|Bacteriophage phi92 (38018)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2VIX , Knot 127 294 0.81 40 168 277
HSSGLVPRGSHMSQERILDGEEDEINHKIFDLKRTLKDNLPLDRDFIDRLKRYFKDPSDQVLALRELLNEKDLTAEQVELLTKIINEIISGSEKSVNAGINSAIQAKLFGNKMKLEPQLLRACYRGFIMGNISTTDQYIEWLGNFGFNHRHTIVNFVEQSLIVDMDSEKPSCNAYEFGFVLSKLIAIKMIRTSDVIFMKKLESSSLLKDGSLSAEQLLLTLLYIFQYPSESEQILTSVIEVSRASHEDSVVYQTYLSSVNESPHDIFKSESEREIAINILRELVTSAYKKELSR
5SZX , Knot 9 18 0.48 8 10 14
TCTTCATCGCTCAGTGCT
3PQI , Knot 112 247 0.83 38 161 234
GSMDALNFVTAIRGLINEQVAEVHTSLPVRVIGVDYGSKTVTLESIVKNTRSTEDEIDYPTFHDVPFMVNGGGTGRISFPIKAGDIGVVVFSERDPSNAFQTDGDTASSGTLIQPCGLYPACFIPKIATATDSSEEVDSEKVIISNNKQTYASFDPNGNISVYNTQGMKIDMTPNSIVLTDAGGGKLTLQGGTMTYKGGTVNLNGLTITPDGRMTDSGGIGLHTHTHPVRGVETGGSTVTSDKPNGG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2VIX_1)}(2) \setminus P_{f(5SZX_1)}(2)|=166\), \(|P_{f(5SZX_1)}(2) \setminus P_{f(2VIX_1)}(2)|=8\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:000111101001000011010000100011010001000111000110010001001000111100110000101001011001100110100001011100110101110010101011010001111101000000101110111000001101100011101000010001001111100111101100001111001000011001010100111011011001000001100110100100000110000100100010011000000011101100110010000100
Pair \(Z_2\) Length of longest common subsequence
2VIX_1,5SZX_1 174 2
2VIX_1,3PQI_1 149 4
5SZX_1,3PQI_1 161 3

Newick tree

 
[
	5SZX_1:86.69,
	[
		2VIX_1:74.5,3PQI_1:74.5
	]:12.19
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{312 }{\log_{20} 312}-\frac{18}{\log_{20}18})=97.9\)
Status Protein1 Protein2 d d1/2
Query variables 2VIX_1 5SZX_1 126 66.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]