CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8XQN_1 2LMK_1 4ZCG_1 Letter Amino acid
18 5 23 R Arginine
16 2 13 N Asparagine
21 1 18 T Threonine
15 0 13 Y Tyrosine
10 2 4 C Cysteine
29 3 23 E Glutamic acid
26 2 19 I Isoleucine
32 6 14 K Lycine
20 2 14 F Phenylalanine
31 5 43 A Alanine
31 4 16 D Aspartic acid
14 5 16 Q Glutamine
17 5 28 G Glycine
6 0 7 H Histidine
10 0 5 M Methionine
28 7 35 L Leucine
4 3 17 P Proline
20 3 21 S Serine
3 0 1 W Tryptophan
19 2 23 V Valine

8XQN_1|Chain A|Guanine nucleotide-binding protein G(i) subunit alpha-1|Homo sapiens (9606)
>2LMK_1|Chain A|Exocrine gland-secreting peptide 1|Mus musculus (10090)
>4ZCG_1|Chain A|Gamma-glutamyltranspeptidase 1 heavy chain|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8XQN , Knot 159 370 0.84 40 208 350
MDYKDDDDKENLYFQSMGCTLSAEDKAAVERSKMIDRNLREDGEKAAREVKLLLLGAGESGKSTIVKQMKIIHEAGYSEEECKQYKAVVYSNTIQSIIAIIRAMGRLKIDFGDSARADDARQLFVLAGAAEEGFMTAELAGVIKRLWKDSGVQACFNRSREYQLNDSAAYYLNDLDRIAQPNYIPTQQDVLRTRVKTTGIVETHFTFKDLHFKMFDVGAQRSERKKWIHCFEGVTAIIFCVALSDYDLVLAEDEEMNRMHESMKLFDSICNNKWFTDTSIILFLNKKDLFEEKIKKSPLTICYPEYAGSNTYEEAAAYIQCQFEDLNKRKDTKEIYTHFTCSTDTKNVQFVFDAVTDVIIKNNLKDCGLF
2LMK , Knot 32 57 0.75 32 48 55
GSNPDPQEVQRALARILCALGELDKLVKDQANAGQQEFKLPKDFTGRSKCRSLGRIK
4ZCG , Knot 156 353 0.86 40 206 340
SASKEPDNHVYTRAAVAADAKQCSKIGRDALRDGGSAVDAAIAALLCVGLMNAHSMGIGGGLFLTIYNSTTRKAEVINAREVAPRLAFATMFNSSEQSQKGGLSVAVPGEIRGYELAHQRHGRLPWARLFQPSIQLARQGFPVGKGLAAALENKRTVIEQQPVLCEVFCRDRKVLREGERLTLPQLADTYETLAIEGAQAFYNGSLTAQIVKDIQAAGGIVTAEDLNNYRAELIEHPLNISLGDAVLYMPSAPLSGPVLALILNILKGYNFSRESVESPEQKGLTYHRIVEAFRFAYAKRTLLGDPKFVDVTEVVRNMTSEFFAAQLRAQISDDTTHPISYYKPEFYTPDDGG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8XQN_1)}(2) \setminus P_{f(2LMK_1)}(2)|=177\), \(|P_{f(2LMK_1)}(2) \setminus P_{f(8XQN_1)}(2)|=17\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1000000000010100110010100011100001100010001001100101111111001000110010110011000000000011100001001111101110101011001010010011111111001110101111100110001101010000000100011001001001101001100001100010001110001010010101101110000000110010110111101110000111100001001000101100100001100001111100001100010001101001001100000011101000100100000000100010000000010111011001110001000111
Pair \(Z_2\) Length of longest common subsequence
8XQN_1,2LMK_1 194 3
8XQN_1,4ZCG_1 168 4
2LMK_1,4ZCG_1 196 3

Newick tree

 
[
	2LMK_1:10.60,
	[
		8XQN_1:84,4ZCG_1:84
	]:17.60
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{427 }{\log_{20} 427}-\frac{57}{\log_{20}57})=114.\)
Status Protein1 Protein2 d d1/2
Query variables 8XQN_1 2LMK_1 146 83.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]