CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6JOM_1 9AYZ_1 5OKM_1 Letter Amino acid
9 3 25 R Arginine
3 1 6 C Cysteine
20 7 30 G Glycine
7 10 11 H Histidine
29 0 28 I Isoleucine
5 1 9 W Tryptophan
11 21 14 A Alanine
27 4 22 N Asparagine
38 18 42 L Leucine
39 11 27 K Lycine
23 7 26 F Phenylalanine
11 9 34 T Threonine
15 3 16 Y Tyrosine
21 8 25 D Aspartic acid
15 1 16 Q Glutamine
23 4 32 E Glutamic acid
22 13 30 V Valine
4 2 8 M Methionine
9 7 18 P Proline
20 11 42 S Serine

6JOM_1|Chains A, B|Lipoate--protein ligase|Mycoplasma hyopneumoniae J (262719)
>9AYZ_1|Chains A, C, E, G|Hemoglobin subunit alpha|Homo sapiens (9606)
>5OKM_1|Chains A, B, C, D, E, F, G, H|Phosphatidylinositol 3,4,5-trisphosphate 5-phosphatase 2|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6JOM , Knot 147 351 0.81 40 194 327
MYLIEPKRNGKWVFDGAILLAIQYWAIKNLKLDETIVFPYICDPHVQIGYFQNPSVEVNLELLKQKNIEVVRRDTGGGAIYLDRNGVNFCFSFPYEKNKNLLGNYAQFYDPVIKVLQNIGIKNVQFSGKNDLQIEGKKVSGAAMSLVNDRIYAGFSLLYDVDFDFIGKILTPNQKKIEAKGIKSVSQRVTNLKNKLSKEYQNFSIFEIKDLFLTEFLKVNSVEKFKKYELTDSDWVQIDKMVAEKYKNWDFVWGLSPNYSFNRSIRTKVGTITFSLEINEGKISKIKISGDFFPKKSLLELENFLMGTKLTQDQLLNRLKDAKLEDYFSQKIDEEEICNLLLNLEHHHHHH
9AYZ , Knot 69 141 0.80 38 99 137
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
5OKM , Knot 189 461 0.83 40 244 439
GPEPDMISVFIGTWNMGSVPPPKNVTSWFTSKGLGKTLDEVTVTIPHDIYVFGTQENSVGDREWLDLLRGGLKELTDLDYRPIAMQSLWNIKVAVLVKPEHENRISHVSTSSVKTGIANTLGNKGAVGVSFMFNGTSFGFVNCHLTSGNEKTARRNQNYLDILRLLSLGDRQLNAFDISLRFTHLFWFGDLNYRLDMDIQEILNYISRKEFEPLLRVDQLNLEREKHKVFLRFSEEEISFPPTYRYERGSRDTYAWHKQKPTGVRTNVPSWCDRILWKSYPETHIICNSYGCTDDIVTSDHSPVFGTFEVGVTSQFISKKGLSKTSDQAYIEFESIEAIVKTASRTKFFIEFYSTCLEEYKKSFENDAQSSDNINFLKVQWSSRQLPTLKPILADIEYLQDQHLLLTVKSMDGYESYGECVVALKSMIGSTAQQFLTFLSHRGEETGNIRGSMKVRVPTER

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6JOM_1)}(2) \setminus P_{f(9AYZ_1)}(2)|=139\), \(|P_{f(9AYZ_1)}(2) \setminus P_{f(6JOM_1)}(2)|=44\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101101000101110111111100111001010001111010010101101001010101011000010110000111110100011010101100000011100101001110110011100101010001010100101111011000101110110010101110110100001010110010001001000100000010110100111001101001001000010000110100111000001011111010001000100011010101010010100101010111000110100111100100001100100101000100010000100111010000000
Pair \(Z_2\) Length of longest common subsequence
6JOM_1,9AYZ_1 183 4
6JOM_1,5OKM_1 156 4
9AYZ_1,5OKM_1 199 4

Newick tree

 
[
	9AYZ_1:10.76,
	[
		6JOM_1:78,5OKM_1:78
	]:22.76
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{492 }{\log_{20} 492}-\frac{141}{\log_{20}141})=103.\)
Status Protein1 Protein2 d d1/2
Query variables 6JOM_1 9AYZ_1 129 89.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]