CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7ZEJ_1 8WOU_1 1YRH_1 Letter Amino acid
12 37 7 I Isoleucine
21 22 8 K Lycine
13 11 6 F Phenylalanine
29 22 14 S Serine
11 13 9 R Arginine
8 20 7 N Asparagine
14 22 8 D Aspartic acid
35 25 20 G Glycine
8 22 10 Q Glutamine
9 9 7 M Methionine
20 22 9 P Proline
15 15 5 Y Tyrosine
36 24 13 V Valine
29 43 29 A Alanine
6 6 0 C Cysteine
17 21 13 E Glutamic acid
4 11 8 H Histidine
35 31 16 L Leucine
19 20 18 T Threonine
0 3 4 W Tryptophan

7ZEJ_1|Chains A, B|Prostaglandin reductase 3|Homo sapiens (9606)
>8WOU_1|Chains A, B|Aminotransferase|Legionella pneumophila (446)
>1YRH_1|Chains A, B, C, D, E, F, G, H|trp repressor binding protein WrbA|Deinococcus radiodurans (1299)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7ZEJ , Knot 147 341 0.83 38 186 318
SMMQKLVVTRLSPNFREAVTLSRDCPVPLPGDGDLLVRNRFVGVNASDINYSAGRYDPSVKPPFDIGFEGIGEVVALGLSASARYTVGQAVAYMAPGSFAEYTVVPASIATPVPSVKPEYLTLLVSGTTAYISLKELGGLSEGKKVLVTAAAGGTGQFAMQLSKKAKCHVIGTCSSDEKSAFLKSLGCDRPINYKTEPVGTVLKQEYPEGVDVVYESVGGAMFDLAVDALATKGRLIVIGFISGYQTPTGLSPVKAGTLPAKLLKKSASVQGFFLNHYLSKYQAAMSHLLEMCVSGDLVCEVDLGDLSPEGRFTGLESIFRAVNYMYMGKNTGKIVVELPH
8WOU , Knot 168 399 0.84 40 225 373
MHHHHHHMDIALAKRVQKVKPSPTLAVAAKAAQMKAQGLDIIGLGTGEPDFDTPQHIKLAAISAIEAGDTKYTAVDGIVELKEAVKNKFKRDNELDYQLNQILVSVGGKQSCYNLCQAYLNPGDEVIIPAPYWVSYPDMVLLADGVPVIIETTPAQRYKINAQQLEQAITPKTRMIFLNSPSNPSGIAYTQNELKELGDVLKKHPQILIATDDMYEHIIWSQPFTNILNACPELYDRTIVLNGVSKAYAMTGWRIGYAAGPAPLINAMKTIQSQSTSNPCSIAQRAAVAALNGSNESIEEMVNAFHQRHDYVADRLQSIDGIEVIPADGTFYIFPSVQAIIEKRGYANDIEFSEKLLNEVGVALVPGSAFGTEGCIRISFATGIDTLKDALNRLQRFCS
1YRH , Knot 94 211 0.79 38 139 199
MSLTAPVKLAIVFYSSTGTGYAMAQEAAEAGRAAGAEVRLLKVRETAPQDVIDGQDAWKANIEAMKDVPEATPADLEWAEAIVFSSPTRFGGATSQMRAFIDTLGGLWSSGKLANKTFSAMTSAQNVNGGQETTLQTLYMTAMHWGAVLTPPGYTDEVIFKSGGNPYGASVTANGQPLLENDRASIRHQVRRQVELTAKLLEGGSHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7ZEJ_1)}(2) \setminus P_{f(8WOU_1)}(2)|=65\), \(|P_{f(8WOU_1)}(2) \setminus P_{f(7ZEJ_1)}(2)|=104\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:01100111001010100110100001111110101110001111010010001100010101110111011101111110101000110111011110110001111011011101010010111010010101001111001001110111110101110100010001110000000011100110001100000111011000010110110001111110111011100101111111010001011011011011101100010101111000100001110011010101011001011010101010110011011001011000101110110
Pair \(Z_2\) Length of longest common subsequence
7ZEJ_1,8WOU_1 169 5
7ZEJ_1,1YRH_1 159 4
8WOU_1,1YRH_1 164 6

Newick tree

 
[
	8WOU_1:84.47,
	[
		7ZEJ_1:79.5,1YRH_1:79.5
	]:4.97
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{740 }{\log_{20} 740}-\frac{341}{\log_{20}341})=109.\)
Status Protein1 Protein2 d d1/2
Query variables 7ZEJ_1 8WOU_1 139 126
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]