CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5KYB_1 1MZA_1 9MON_1 Letter Amino acid
30 11 22 D Aspartic acid
31 23 19 K Lycine
14 6 12 F Phenylalanine
10 5 16 M Methionine
16 19 26 T Threonine
17 6 16 Y Tyrosine
12 13 29 A Alanine
7 8 6 C Cysteine
12 13 30 I Isoleucine
34 18 27 L Leucine
2 3 4 W Tryptophan
21 10 11 Q Glutamine
25 6 28 E Glutamic acid
14 10 9 H Histidine
13 13 19 P Proline
26 18 21 V Valine
14 8 18 R Arginine
14 7 12 N Asparagine
22 22 28 G Glycine
17 21 24 S Serine

5KYB_1|Chains A, B|Ubiquitin carboxyl-terminal hydrolase 7|Homo sapiens (9606)
>1MZA_1|Chain A|pro-granzyme K|Homo sapiens (9606)
>9MON_1|Chains A, B, C, D, E, F, G|Actin, alpha cardiac muscle 1|Mus musculus (10090)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5KYB , Knot 152 351 0.84 40 211 343
GSHMKKHTGYVGLKNQGATCYMNSLLQTLFFTNQLRKAVYMMPTEGDDSSKSVPLALQRVFYELQHSDKPVGTKKLTKSFGWETLDSFMQHDVQELCRKLLDNVENKMKGTCVEGTIPKLFRGKMVSYIQCKEVDYRSDRREDYYDIQLSIKGKKNIFESFVDYVAVEQLDGDNKYDAGEHGLQEAEKGVKFLTLPPVLHLQLMRFMYDPQTDQNIKINDRFEFPEQLPLDEFLQKTDPKDPANYILHAVLVHSGDNHGGHYVVYLNPKGDGKWCKFDDDVVSRCTKEEAIEHNYGGHDDDLSVRHCTNAYMLVYIRESKLSEVLQAVTDHDIPQQLVERLQEEKRIEAQK
1MZA , Knot 106 240 0.80 40 154 230
MEIIGGKEVSPHSRPFMASIQYGGHHVCGGVLIDPQWVLTAAHCQYRFTKGQSPTVVLGAHSLSKNEASKQTLEIKKFIPFSRVTSDPQSNDIMLVKLQTAAKLNKHVKMLHIRSKTSLRSGTKCKVTGWGATDPDSLRPSDTLREVTVTVLSRKLCNSQSYYNGDPFITKDMVCAGDAKGQKDSCKGDAGGPLICKGVFHAIVSGGHECGVATKPGIYTLLTKKYQTWIKSNLVPPHTN
9MON , Knot 160 377 0.84 40 218 361
MCDDEETTALVCDNGSGLVKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQSKRGILTLKYPIEHGIITNWDDMEKIWHHTFYNELRVAPEEHPTLLTEAPLNPKANREKMTQIMFETFNVPAMYVAIQAVLSLYASGRTTGIVLDSGDGVTHNVPIYEGYALPHAIMRLDLAGRDLTDYLMKILTERGYSFVTTAEREIVRDIKEKLCYVALDFENEMATAASSSSLEKSYELPDGQVITIGNERFRCPETLFQPSFIGMESAGIHETTYNSIMKCDIDIRKDLYANNVLSGGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQEYDEAGPSIVHRKCF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5KYB_1)}(2) \setminus P_{f(1MZA_1)}(2)|=122\), \(|P_{f(1MZA_1)}(2) \setminus P_{f(5KYB_1)}(2)|=65\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100100001011100011000100110011100010011011100100000011111001100100000111000100011100100110001001000110010001010010101101101011001000010000000000001010101000110011001110010100000110011001001101101111101011011001000001010001011001110011000010011001101111001000110011010101010100100011000000011000011000010100000101110100001001101100001100110010000010100
Pair \(Z_2\) Length of longest common subsequence
5KYB_1,1MZA_1 187 4
5KYB_1,9MON_1 177 4
1MZA_1,9MON_1 196 5

Newick tree

 
[
	1MZA_1:98.08,
	[
		5KYB_1:88.5,9MON_1:88.5
	]:9.58
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{591 }{\log_{20} 591}-\frac{240}{\log_{20}240})=99.4\)
Status Protein1 Protein2 d d1/2
Query variables 5KYB_1 1MZA_1 125 103.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]