CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1JHL_1 6WCT_1 1IBU_1 Letter Amino acid
10 10 5 T Threonine
1 18 6 V Valine
1 6 2 M Methionine
6 11 2 P Proline
9 16 5 L Leucine
5 3 5 Y Tyrosine
4 4 7 N Asparagine
7 12 3 E Glutamic acid
1 12 0 H Histidine
4 10 0 F Phenylalanine
16 14 5 S Serine
3 14 7 D Aspartic acid
2 1 0 C Cysteine
5 16 2 Q Glutamine
10 15 9 G Glycine
9 11 6 I Isoleucine
6 7 6 K Lycine
2 3 1 W Tryptophan
4 31 6 A Alanine
3 15 4 R Arginine

1JHL_1|Chain A[auth L]|IGG1-KAPPA D11.15 FV (LIGHT CHAIN)|Mus musculus (10090)
>6WCT_1|Chains A, B, C, D|Guanylate kinase|Stenotrophomonas maltophilia (strain K279a) (522373)
>1IBU_1|Chains A, C, E|HISTIDINE DECARBOXYLASE BETA CHAIN|Lactobacillus sp. (1593)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1JHL , Knot 54 108 0.78 40 84 102
DIELTQSPSYLVASPGETITINCRASKSISKSLAWYQEKPGKTNNLLIYSGSTLQSGIPSRFSGSGSGTDFTLTISSLEPEDFAMYICQQHNEYPWTFGGGTKLEIKR
6WCT , Knot 102 229 0.80 40 153 217
MAHHHHHHMSAPSKPSDAVARGTLYIVAAPSGAGKSSIVNATLARDPQIALSISFTSRAMRPGEVNGQHYHFVSAEKFEQMIAAGDFFEHAWVHGDWKGTARQSVEPQLAAGQDVLLEIDWQGAQQVRQLVPGTVTVFILPPSKQALQDRMRKRGQDSEAVIAQRLGAARDEMLHFNEFDYVIVNEVFDTAVDELCAIFTASRLRREAQKVRHAGLIQALLTPDPGATD
1IBU , Knot 44 81 0.79 34 71 79
SELDAKLNKLGVDRIAISPYKQWTRGYMEPGNIGNGYVTGLKVDAGVRDKSDNNVLDGIVSYDRAETKNAYIGQINMTTAS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1JHL_1)}(2) \setminus P_{f(6WCT_1)}(2)|=45\), \(|P_{f(6WCT_1)}(2) \setminus P_{f(1JHL_1)}(2)|=114\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:010100010011101100101000100010001110000110000111001001001110010101010010101001010011101000000011011110010100
Pair \(Z_2\) Length of longest common subsequence
1JHL_1,6WCT_1 159 3
1JHL_1,1IBU_1 121 2
6WCT_1,1IBU_1 156 3

Newick tree

 
[
	6WCT_1:83.96,
	[
		1JHL_1:60.5,1IBU_1:60.5
	]:23.46
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{337 }{\log_{20} 337}-\frac{108}{\log_{20}108})=70.9\)
Status Protein1 Protein2 d d1/2
Query variables 1JHL_1 6WCT_1 91 66.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]