CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6QXK_1 6HKT_1 4MQU_1 Letter Amino acid
20 4 27 D Aspartic acid
11 8 39 Q Glutamine
38 12 69 L Leucine
4 3 17 M Methionine
24 6 47 S Serine
7 0 6 W Tryptophan
21 18 26 R Arginine
6 2 9 C Cysteine
24 7 42 E Glutamic acid
13 3 21 H Histidine
11 13 34 K Lycine
14 18 50 A Alanine
8 1 16 N Asparagine
23 8 44 G Glycine
8 10 38 T Threonine
7 3 8 Y Tyrosine
19 7 41 I Isoleucine
14 4 28 F Phenylalanine
19 6 28 P Proline
22 6 46 V Valine

6QXK_1|Chain A[auth B]|Serine/threonine-protein kinase pim-1|Homo sapiens (9606)
>6HKT_1|Chains A, AA[auth E], EA[auth O], E[auth K], IA[auth U], K[auth Y], MA[auth a], O[auth e], QA[auth k], S[auth o], UA[auth u], W[auth y]|Histone H3.1|Homo sapiens (9606)
>4MQU_1|Chains A, B|Glucokinase regulatory protein|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6QXK , Knot 135 313 0.82 40 196 296
SMLLSKINSLAHLRAAPCNDLHATKLAPGKEKEPLESQYQVGPLLGSGGFGSVYSGIRVSDNLPVAIKHVEKDRISDWGELPNGTRVPMEVVLLKKVSSGFSGVIRLLDWFERPDSFVLILERPEPVQDLFDFITERGALQEELARSFFWQVLEAVRHCHNCGVLHRDIKDENILIDLNRGELKLIDFGSGALLKDTVYTDFDGTRVYSPPEWIRYHRYHGRSAAVWSLGILLYDMVCGDIPFEHDEEIIGGQVFFRQRVSSECQHLIRWCLALRPSDRPTFEEIQNHPWMQDVLLPQETAEIHLHSLSPGPS
6HKT , Knot 68 139 0.80 38 107 131
GSHMARTKQTARKSTGGKAPRKQLATKAARKSAPATGGVKKPHRYRPGTVALREIRRYQKSTELLIRKLPFQRLVREIAQDFKTDLRFQSSAVMALQEACEAYLVGLFEDTNLCAIHAKRVTIMPKDIQLARRIRGERA
4MQU , Knot 252 636 0.85 40 271 588
MHHHHHHDEVDMPGTKRFQHVIETPEPGKWELSGYEAAVPITEKSNPLTQDLDKADAENIVRLLGQCDAEIFQEEGQALSTYQRLYSESILTTMVQVAGKVQEVLKEPDGGLVVLSGGGTSGRMAFLMSVSFNQLMKGLGQKPLYTYLIAGGDRSVVASREGTEDSALHGIEELKKVAAGKKRVIVIGISVGLSAPFVAGQMDCCMNNTAVFLPVLVGFNPVSMARNDPIEDWSSTFRQVAERMQKMQEKQKAFVLNPAIGPEGLSGSSRMKGGSATKILLETLLLAAHKTVDQGIAASQRCLLEILRTFERAHQVTYSQSPKIATLMKSVSTSLEKKGHVYLVGWQTLGIIAIMDGVECIHTFGADFRDVRGFLIGDHSDMFNQKAELTNQGPQFTFSQEDFLTSILPSLTEIDTVVFIFTLDDNLTEVQTIVEQVKEKTNHIQALAHSTVGQTLPIPLKKLFPSIISITWPLLFFEYEGNFIQKFQRELSTKWVLNTVSTGAHVLLGKILQNHMLDLRISNSKLFWRALAMLQRFSGQSKARCIESLLRAIHFPQPLSDDIRAAPISCHVQVAHEKEQVIPIALLSLLFRCSITEAQAHLAAAPSVCEAVRSALAGPGQKRTADPLEILEPDVQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6QXK_1)}(2) \setminus P_{f(6HKT_1)}(2)|=135\), \(|P_{f(6HKT_1)}(2) \setminus P_{f(6QXK_1)}(2)|=46\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0111001001101011100010100111100001100000111111011110100110100011111001000010011011010011101111001001101110110110010011111001011001101100011100011001110110110000001110001000011101001010110110111100010001010010011011000000100111101111100110101110000011110111000100000011010111010001010010001110011110001010100101110
Pair \(Z_2\) Length of longest common subsequence
6QXK_1,6HKT_1 181 4
6QXK_1,4MQU_1 157 4
6HKT_1,4MQU_1 204 4

Newick tree

 
[
	6HKT_1:10.69,
	[
		6QXK_1:78.5,4MQU_1:78.5
	]:23.19
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{452 }{\log_{20} 452}-\frac{139}{\log_{20}139})=93.2\)
Status Protein1 Protein2 d d1/2
Query variables 6QXK_1 6HKT_1 118 83.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]