CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1ZOB_1 4RCZ_1 7PDO_1 Letter Amino acid
33 18 24 R Arginine
8 14 5 Q Glutamine
24 32 20 E Glutamic acid
25 41 12 I Isoleucine
15 11 8 F Phenylalanine
2 7 4 W Tryptophan
9 14 5 N Asparagine
27 16 21 D Aspartic acid
12 8 4 Y Tyrosine
50 24 31 A Alanine
5 4 2 C Cysteine
8 6 9 H Histidine
52 34 44 L Leucine
11 33 6 K Lycine
23 26 17 S Serine
22 22 24 T Threonine
27 41 31 V Valine
47 34 31 G Glycine
14 9 6 M Methionine
19 21 22 P Proline

1ZOB_1|Chain A|2,2-dialkylglycine decarboxylase|Burkholderia cepacia (292)
>4RCZ_1|Chain A|Translation initiation factor 2 subunit gamma|Sulfolobus solfataricus (273057)
>7PDO_1|Chain A|Glucosyl-3-phosphoglycerate synthase|Mycolicibacterium hassiacum DSM 44199 (1122247)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1ZOB , Knot 179 433 0.83 40 217 401
MSLNDDATFWRNARHHLVRYGGTFEPMIIERAKGSFVYDADGRAILDFTSGQMSAVLGHCHPEIVSVIGEYAGKLDHLFSEMLSRPVVDLATRLANITPPGLDRALLLSTGAESNEAAIRMAKLVTGKYEIVGFAQSWHGMTGAAASATYSAGRKGVGPAAVGSFAIPAPFTYRPRFERNGAYDYLAELDYAFDLIDRQSSGNLAAFIAEPILSSGGIIELPDGYMAALKRKCEARGMLLILDEAQTGVGRTGTMFACQRDGVTPDILTLSKTLGAGLPLAAIVTSAAIEERAHELGYLFYTTHVSDPLPAAVGLRVLDVVQRDGLVARANVMGDRLRRGLLDLMERFDCIGDVRGRGLLLGVEIVKDRRTKEPADGLGAKITRECMNLGLSMNIVQLPGMGGVFRIAPPLTVSEDEIDLGLSLLGQAIERAL
4RCZ , Knot 171 415 0.82 40 226 391
MAWPKVQPEVNIGVVGHVDHGKTTLVQAITGIWTSKHSEELKRGMTIKLGYAETNIGVCESCKKPEAYVTEPSCKSCGSDDEPKFLRRISFIDAPGHEVLMATMLSGAALMDGAILVVAANEPFPQPQTREHFVALGIIGVKNLIIVQNKVDVVSKEEALSQYRQIKQFTKGTWAENVPIIPVSALHKINIDSLIEGIEEYIKTPYRDLSQKPVMLVIRSFDVNKPGTQFNELKGGVIGGSIIQGLFKVDQEIKVLPGLRVEKQGKVSYEPIFTKISSIRFGDEEFKEAKPGGLVAIGTYLDPSLTKADNLLGSIITLADAEVPVLWNIRIKYNLLERVVGAKEMLKVDPIRAKETLMLSVGSSTTLGIVTSVKKDEIEVELRRPVAVWSNNIRTVISRQIAGRWRMIGWGLVEI
7PDO , Knot 137 326 0.81 40 171 303
MTLVPDLTATDLARHRWLTDNSWTRPTWTVAELEAAKAGRTISVVLPALNEEETVGGVVETIRPLLGGLVDELIVLDSGSTDDTEIRAMAAGARVISREVALPEVAPQPGKGEVLWRSLAATTGDIIVFIDSDLIDPDPMFVPKLVGPLLLSEGVHLVKGFYRRPLKTSGSEDAHGGGRVTELVARPLLAALRPELTCVLQPLGGEYAGTRELLMSVPFAPGYGVEIGLLVDTYDRLGLDAIAQVNLGVRAHRNRPLTDLAAMSRQVIATLFSRCGVPDSGVGLTQFFADGDGFSPRTSEVSLVDRPPMNTLRGKLAAALEHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1ZOB_1)}(2) \setminus P_{f(4RCZ_1)}(2)|=70\), \(|P_{f(4RCZ_1)}(2) \setminus P_{f(1ZOB_1)}(2)|=79\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1010001011001000110011010111100101011001010111010010101111000101101110011010011001100111011001101011110011110011000011101101101000111110010110111101000110011111111011111110001010001100011010011011000001011111101110011110110101111000001011111100100111001011100001101011010001111111111100111000100110110000100111111110110110001111010111001001110110010011010101111110110000000110111101000010111010110111111110111110100001011101110110011
Pair \(Z_2\) Length of longest common subsequence
1ZOB_1,4RCZ_1 149 4
1ZOB_1,7PDO_1 130 5
4RCZ_1,7PDO_1 155 4

Newick tree

 
[
	4RCZ_1:79.34,
	[
		1ZOB_1:65,7PDO_1:65
	]:14.34
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{848 }{\log_{20} 848}-\frac{415}{\log_{20}415})=115.\)
Status Protein1 Protein2 d d1/2
Query variables 1ZOB_1 4RCZ_1 146 144
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]