CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7LZD_1 1SJQ_1 7UHR_1 Letter Amino acid
12 3 5 F Phenylalanine
13 8 14 V Valine
14 2 18 D Aspartic acid
5 8 10 H Histidine
17 8 29 L Leucine
7 4 6 M Methionine
16 8 13 S Serine
11 7 20 T Threonine
22 5 15 R Arginine
15 7 8 N Asparagine
21 8 23 G Glycine
10 6 18 P Proline
12 4 9 Q Glutamine
23 6 10 E Glutamic acid
9 7 8 I Isoleucine
3 0 5 W Tryptophan
10 3 8 Y Tyrosine
15 5 46 A Alanine
17 0 2 C Cysteine
26 6 8 K Lycine

7LZD_1|Chain A|Histone-lysine N-methyltransferase SETD2|Homo sapiens (9606)
>1SJQ_1|Chain A|Polypyrimidine tract-binding protein 1|Homo sapiens (9606)
>7UHR_1|Chain A|Putative metallo-beta-lactamase l1 (Beta-lactamase type ii) (Ec 3.5.2.6) (Penicillinase)|Stenotrophomonas maltophilia K279a (522373)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7LZD , Knot 126 278 0.85 40 189 269
GETSVPPGSALVGPSCVMDDFRDPQRWKECAKQGKMPCYFDLIEENVYLTERKKNKSHRDIKRMQCECTPLSKDERAQGEIACGEDCLNRLLMIECSSRCPNGDYCSNRRFQRKQHADVEVILTEKKGWGLRAAKDLPSNTFVLEYCGEVLDHKEFKARVKEYARNKNIHYYFMALKNDEIIDATQKGNCSRFMNHSCEPNCETQKWTVNGQLRVGFFTTKLVPSGSELTFDYQFQRYGKEAQKCFCGSANCRGYLGGENRVSIRAAGGKMKKERSRK
1SJQ , Knot 53 105 0.78 36 87 99
MRGSHHHHHHGSGVPSRVIHIRKLPIDVTEGEVISLGLPFGKVTNLLMLKGKNQAFIEMNTEEAANTMVNYYTSVTPVLRGQPIYIQFSNHKELKTDSSPNQARA
7UHR , Knot 115 275 0.78 40 157 259
SNAASAAEAPLPQLRAYTVDASWLQPMAPLQVADHTWQIGTEDLTALLVQTAEGAVLLDGGMPQMAGHLLDNMKLRGVAPQDLRLILLSHAHADHAGPVAELKRRTGAHVAANAETAVLLARGGSNDLHFGDGITYPPASADRIIMDGEVVTVGGIAFTAHFMPGHTPGSTAWTWTDTRDGKPVRIAYADSLSAPGYQLKGNPRYPRLIEDYKRSFATVRALPCDLLLTPHPGASNWNYAVGSKASAEALTCNAYADAAEKKFDAQLARETAGTR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7LZD_1)}(2) \setminus P_{f(1SJQ_1)}(2)|=137\), \(|P_{f(1SJQ_1)}(2) \setminus P_{f(7LZD_1)}(2)|=35\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10001111011111001100100100100010010110010110001010000000000010010000011000001010110100010011110000001010000000100000101011100001111011001100011100010110000101010001000010001111000011010001000011000001000000101010101111000111010010100010001001000101010001011100010101111010000000
Pair \(Z_2\) Length of longest common subsequence
7LZD_1,1SJQ_1 172 3
7LZD_1,7UHR_1 190 3
1SJQ_1,7UHR_1 174 3

Newick tree

 
[
	7UHR_1:92.72,
	[
		7LZD_1:86,1SJQ_1:86
	]:6.72
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{383 }{\log_{20} 383}-\frac{105}{\log_{20}105})=85.2\)
Status Protein1 Protein2 d d1/2
Query variables 7LZD_1 1SJQ_1 110 74
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]