CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7MCN_1 8RUQ_1 5AQI_1 Letter Amino acid
32 7 26 I Isoleucine
12 6 12 P Proline
15 2 6 H Histidine
35 10 27 T Threonine
18 7 23 E Glutamic acid
28 6 30 G Glycine
19 13 28 K Lycine
13 3 10 Y Tyrosine
30 6 31 V Valine
11 1 18 N Asparagine
13 8 12 Q Glutamine
28 4 28 D Aspartic acid
0 1 2 C Cysteine
36 12 26 L Leucine
7 2 7 M Methionine
10 4 18 F Phenylalanine
30 6 22 S Serine
30 19 37 A Alanine
13 18 22 R Arginine
0 0 1 W Tryptophan

7MCN_1|Chain A[auth H]|Bifunctional cystathionine gamma-lyase/homocysteine desulfhydrase|Staphylococcus aureus (1280)
>8RUQ_1|Chain A|Histone H3.2|Xenopus laevis (8355)
>5AQI_1|Chains A, C|HEAT SHOCK COGNATE 71 KDA PROTEIN|HOMO SAPIENS (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7MCN , Knot 158 380 0.82 36 197 359
SNKKTKLIHGGHTTDDYTGAVTTPIYQTSTYLQDDIGDLRQGYEYSRTANPTRSSVESVIATLENGKHGFAFSSGVAAISAVVMLLDKGDHIILNSDVYGGTYRALTKVFTRFGIEVDFVDTTHTDSIVQAIRPTTKMLFIETPSNPLLRVTDIKKSAEIAKEHGLISVVDNTFMTPYYQNPLDLGIDIVLHSATKYLGGHSDVVAGLVATSDDKLAERLAFISNSTGGILGPQDSYLLVRGIKTLGLRMEQINRSVIEIIKMLQAHPAVQQVFHPSIESHLNHDVHMAQADGHTGVIAFEVKNTESAKQLIKATSYYTLAESLGAVESLISVPALMTHASIPADIRAKEGITDGLVRISVGIEDTEDLVDDLKQALDTL
8RUQ , Knot 65 135 0.78 38 102 126
ARTKQTARKSTGGKAPRKQLATKAARKSAPATGGVKKPHRYRPGTVALREIRRYQKSTELLIRKLPFQRLVREIAQDFKTDLRFQSSAVMALQEASEAYLVALFEDTNLCAIHAKRVTIMPKDIQLARRIRGERA
5AQI , Knot 164 386 0.84 40 214 369
GPLGSMSKGPAVGIDLGTTYSCVGVFQHGKVEIIANDQGNRTTPSYVAFTDTERLIGDAAKNQVAMNPTNTVFDAKRLIGRRFDDAVVQSDMKHWPFMVVNDAGRPKVQVEYKGETKSFYPEEVSSMVLTKMKEIAEAYLGKTVTNAVVTVPAYFNDSQRQATKDAGTIAGLNVLRIINEPTAAAIAYGLDKKVGAERNVLIFDLGGGTFDVSILTIEDGIFEVKSTAGDTHLGGEDFDNRMVNHFIAEFKRKHKKDISENKRAVRRLRTACERAKRTLSSSTQASIEIDSLYEGIDFYTSITRARFEELNADLFRGTLDPVEKALRDAKLDKSQIHDIVLVGGSTRIPKIQKLLQDFFNGKELNKSINPDEAVAYGAAVQAAILS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7MCN_1)}(2) \setminus P_{f(8RUQ_1)}(2)|=131\), \(|P_{f(8RUQ_1)}(2) \setminus P_{f(7MCN_1)}(2)|=36\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00000011011000000011100110000001000110100100000010100001001110100100111100111110111111001001110001011000110011001110101100000001101101000111100100111010010001011000111011000110100001101110111001000111000111111100000110011110000111111000011101100111010010001101101101011100110101000100010110101001111101000001001101000001100111100110111110010111010100110011101011100000110010011001
Pair \(Z_2\) Length of longest common subsequence
7MCN_1,8RUQ_1 167 4
7MCN_1,5AQI_1 137 4
8RUQ_1,5AQI_1 170 4

Newick tree

 
[
	8RUQ_1:88.88,
	[
		7MCN_1:68.5,5AQI_1:68.5
	]:20.38
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{515 }{\log_{20} 515}-\frac{135}{\log_{20}135})=111.\)
Status Protein1 Protein2 d d1/2
Query variables 7MCN_1 8RUQ_1 143 95
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]