CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1OVP_1 6SXC_1 9LEE_1 Letter Amino acid
12 18 0 T Threonine
5 8 0 D Aspartic acid
3 23 0 I Isoleucine
5 16 0 F Phenylalanine
11 17 0 S Serine
1 9 0 K Lycine
0 11 0 M Methionine
1 4 0 W Tryptophan
14 10 0 N Asparagine
7 15 0 Q Glutamine
11 16 97 G Glycine
0 5 0 H Histidine
2 16 0 R Arginine
1 5 0 Y Tyrosine
3 9 0 P Proline
17 26 0 V Valine
10 19 122 A Alanine
0 1 64 C Cysteine
3 12 0 E Glutamic acid
8 37 0 L Leucine

1OVP_1|Chain A|hypothetical protein LecB|Pseudomonas aeruginosa (287)
>6SXC_1|Chain A|Ion transport protein|Magnetococcus marinus MC-1 (156889)
>9LEE_1|Chains A, B, C, D, E, F, G, H, I, J|Sag-18RS21 Golld RNA|Streptococcus agalactiae 18RS21 (342613)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1OVP , Knot 53 114 0.73 34 79 109
ATQGVFTLPANTRFGVTAFANSSGTQTVNVLVNNETAATFSGQSTNNAVIGTQVLNSGSSGKVQVQVSVNGRPSDLVSAQVILTNELNFALVGSEDGTDNDYNDAVVVINWPLG
6SXC , Knot 127 277 0.86 40 178 269
GSHMSRKIRDLIESKRFQNVITAIIVLNGAVLGLLTDTTLSASSQNLLERVDQLCLTIFIVEISLKIYAYGVRGFFRSGWNLFDFVIVAIALMPAQGSLSVLRTFRIFRVMRLVSVIPTMRRVVQGMLLALPGVGSVAALLTVVFYIAAVMATNLYGATFPEWFGDLSKSLYTLFQVMTLESWSMGIVRPVMNVHPNAWVFFIPFIMLTTLTVLNLFIGIIVDAMAITKEQEEEAKTGHHQEPISQTLLHLGDRLDRIEKQLAQNNELLQRQQPQKK
9LEE , Knot 86 378 0.45 8 16 62
GGAGUAGGCGUUGCGCAUUUUGUUGCUCAAAAGGCGACGAAACGCAAGGCAAUGCACGUCUGCGAUACACGAAAACAAUGCUAUUUGUUGAAAAUAUUGGAAUAAAGCAAAAGUCAUUGCCCGUCGCAAACGAAAGUGUGCUUCGGUAGCUAGGCUACCUGCUAGAGUCUCGCAAGGAUAAUAGCAAAGUCAAAGAGUAAAGCAGCUUAGACCUUUAGCGGGGUUUUCGUUAAUUGAAAAAUGGCUUAGUAGUUUGCGGCGUAACGAGUGGUUAGCGAUACUAACCGCGCAUGGUUGUUACUUGAAGGGAUUUGAGUGGAUAAAAAACUAAAACAUAAGGUUUUGAAAGACAACUGACUAAACGUGUAAUCUCAGCGU

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1OVP_1)}(2) \setminus P_{f(6SXC_1)}(2)|=34\), \(|P_{f(6SXC_1)}(2) \setminus P_{f(1OVP_1)}(2)|=133\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100111011100011101110001000101110000110101000001111001100100101010101010100110101110001011111000100000001111101111
Pair \(Z_2\) Length of longest common subsequence
1OVP_1,6SXC_1 167 3
1OVP_1,9LEE_1 93 2
6SXC_1,9LEE_1 190 2

Newick tree

 
[
	6SXC_1:99.72,
	[
		1OVP_1:46.5,9LEE_1:46.5
	]:53.22
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{391 }{\log_{20} 391}-\frac{114}{\log_{20}114})=84.4\)
Status Protein1 Protein2 d d1/2
Query variables 1OVP_1 6SXC_1 111 76.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: