CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6WCA_1 5PIB_1 3UUZ_1 Letter Amino acid
22 15 9 Q Glutamine
16 18 3 H Histidine
7 23 14 K Lycine
30 20 10 T Threonine
10 19 10 Y Tyrosine
19 14 6 D Aspartic acid
26 27 24 G Glycine
41 14 15 V Valine
30 18 2 R Arginine
3 20 15 N Asparagine
13 7 12 C Cysteine
24 20 16 I Isoleucine
12 11 2 M Methionine
32 18 35 S Serine
52 25 15 A Alanine
17 26 5 E Glutamic acid
77 24 14 L Leucine
34 19 5 F Phenylalanine
34 18 7 P Proline
5 8 4 W Tryptophan

6WCA_1|Chains A, B|Endosomal/lysosomal potassium channel TMEM175|Homo sapiens (9606)
>5PIB_1|Chain A|Lysine-specific demethylase 4D|Homo sapiens (9606)
>3UUZ_1|Chains A, B|Cationic trypsin|Bos taurus (9913)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6WCA , Knot 196 504 0.80 40 233 453
MSQPRTPEQALDTPGDCPPGRRDEDAGEGIQCSQRMLSFSDALLSIIATVMILPVTHTEISPEQQFDRSVQRLLATRIAVYLMTFLIVTVAWAAHTRLFQVVGKTDDTLALLNLACMMTITFLPYTFSLMVTFPDVPLGIFLFCVCVIAIGVVQALIVGYAFHFPHLLSPQIQRSAHRALYRRHVLGIVLQGPALCFAAAIFSLFFVPLSYLLMVTVILLPYVSKVTGWCRDRLLGHREPSAHPVEVFSFDLHEPLSKERVEAFSDGVYAIVATLLILDICEDNVPDPKDVKERFSGSLVAALSATGPRFLAYFGSFATVGLLWFAHHSLFLHVRKATRAMGLLNTLSLAFVGGLPLAYQQTSAFARQPRDELERVRVSCTIIFLASIFQLAMWTTALLHQAETLQPSVWFGGREHVLMFAKLALYPCASLLAFASTCLLSRFSVGIFHLMQIAVPCAFLLLRLLVGLALATLRVLRGLARPEHPPPAPTGQDDPQSQLLPAPC
5PIB , Knot 160 364 0.86 40 229 347
MHHHHHHSSGVDLGTENLYFQSMETMKSKANCAQNPNCNIMIFHPTKEEFNDFDKYIAYMESQGAHRAGLAKIIPPKEWKARETYDNISEILIATPLQQVASGRAGVFTQYHKKKKAMTVGEYRHLANSKKYQTPPHQNFEDLERKYWKNRIYNSPIYGADISGSLFDENTKQWNLGHLGTIQDLLEKECGVVIEGVNTPYLYFGMWKTTFAWHTEDMDLYSINYLHLGEPKTWYVVPPEHGQRLERLARELFPGSSRGCGAFLRHKVALISPTVLKENGIPFNRITQEAGEFMVTFPYGYHAGFNHGFNCAEAINFATPRWIDYGKMASQCSCGEARVTFSMDAFVRILQPERYDLWKRGQDR
3UUZ , Knot 100 223 0.80 40 143 215
IVGGYTCGANTVPYQVSLNSGYHFCGGSLINSQWVVSAAHCYKSGIQVRLGEDNINVVEGNEQFISASKSIVHPSYNSETYNNDIMLIKLKSAASLNSRVASISLPTSCASAGTQCLISGWGNTKSSGTSYPDVLKCLKAPILSDSSCKSASSFIITSNMFCAGYLEGGKDACQGDSGGPVVCSGKLQGIVSWGSGCAQKNKPGFYTKLCNYVSWIKQTIASN

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6WCA_1)}(2) \setminus P_{f(5PIB_1)}(2)|=95\), \(|P_{f(5PIB_1)}(2) \setminus P_{f(6WCA_1)}(2)|=91\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100100100110011001110000011011000001101001110111011111100001010001000100111001110110111101111100011011100000111101101101011100101110110111111110101111111011111011011011010100010011000011111101111011111101111110011110111110100101100001110001010110110101001100001011001101111011110100001101001000101011111010110111011011011111110001110100100111110010111111111100000111001000100101000111110110111100111001001010111110001111101110101011111000110010111101101111011111011111111010110111010011111010001000111110
Pair \(Z_2\) Length of longest common subsequence
6WCA_1,5PIB_1 186 5
6WCA_1,3UUZ_1 190 3
5PIB_1,3UUZ_1 204 3

Newick tree

 
[
	3UUZ_1:10.34,
	[
		6WCA_1:93,5PIB_1:93
	]:7.34
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{868 }{\log_{20} 868}-\frac{364}{\log_{20}364})=135.\)
Status Protein1 Protein2 d d1/2
Query variables 6WCA_1 5PIB_1 169 148
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]