CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1TCD_1 5URW_1 9HSI_1 Letter Amino acid
9 10 5 P Proline
12 8 10 S Serine
5 0 3 W Tryptophan
6 1 8 Y Tyrosine
18 14 16 E Glutamic acid
17 7 8 G Glycine
13 9 7 Q Glutamine
2 4 4 M Methionine
24 12 14 V Valine
13 6 5 R Arginine
5 8 19 D Aspartic acid
7 5 4 F Phenylalanine
21 24 12 L Leucine
13 15 17 K Lycine
4 0 1 C Cysteine
5 0 5 H Histidine
16 3 11 I Isoleucine
17 8 12 T Threonine
33 17 8 A Alanine
9 13 10 N Asparagine

1TCD_1|Chains A, B|TRIOSEPHOSPHATE ISOMERASE|Trypanosoma cruzi (5693)
>5URW_1|Chains AA[auth 1C], A[auth 1A], CA[auth 1G], C[auth 1E], E[auth 1I], G[auth 1K], I[auth 2A], KA[auth 2E], K[auth 2C], MA[auth 2G], OA[auth 2I], QA[auth 2K], SA[auth 3C], S[auth 3A], UA[auth 3G], U[auth 3E], W[auth 3I], Y[auth 3K]|TssB|Myxococcus xanthus (strain DK 1622) (246197)
>9HSI_1|Chain A|choline-phosphate cytidylyltransferase|Plasmodium falciparum (5833)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1TCD , Knot 111 249 0.82 40 160 237
SKPQPIAAANWKCNGSESLLVPLIETLNAATFDHDVQCVVAPTFLHIPMTKARLTNPKFQIAAQNAITRSGAFTGEVSLQILKDYGISWVVLGHSERRLYYGETNEIVAEKVAQACAAGFHVIVCVGETNEEREAGRTAAVVLTQLAAVAQKLSKEAWSRVVIAYEPVWAIGTGKVATPQQAQEVHELLRRWVRSKLGTDIAAQLRILYGGSVTAKNARTLYQMRDINGFLVGGASLKPEFVEIIEATK
5URW , Knot 78 164 0.80 34 112 160
MSKESSVAPTERVNIVYKPATGNAQEQVELPLKVLMLGDFTGQEDARPLEQRAPINVDKANFNEVMAQQNLKVTLTAADKLSADPNATMNVSLQFKNLNDFSPESVVNQVPELKKLLELRSALNALKGPLGNLPAFRKKLQALLADEDGRKALIKELGLTEETK
9HSI , Knot 84 179 0.81 40 133 168
GHMAVPDDDDDDDNSNDESEYESSQMDSEKNKGSIKNSKNVVIYADGVYDMLHLGHMKQLEQAKKLFENTTLIVGVTSDNETKLFKGQVVQTLEERTETLKHIRWVDEIISPCPWVVTPEFLEKYKIDYVAHDDIPYANNQKEDIYAWLKRAGKFKATQRTEGVSTTDLIVRILKNYED

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1TCD_1)}(2) \setminus P_{f(5URW_1)}(2)|=101\), \(|P_{f(5URW_1)}(2) \setminus P_{f(1TCD_1)}(2)|=53\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:001011111010001000111111001011010001001111011011100101001010111001100011101010101100011011111000001001000011100110101111011101100000001100111110011111001000110011110011111101011010010010011001100011001110101101101010010010010010111111101010110110100
Pair \(Z_2\) Length of longest common subsequence
1TCD_1,5URW_1 154 4
1TCD_1,9HSI_1 169 3
5URW_1,9HSI_1 145 3

Newick tree

 
[
	1TCD_1:83.43,
	[
		5URW_1:72.5,9HSI_1:72.5
	]:10.93
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{413 }{\log_{20} 413}-\frac{164}{\log_{20}164})=74.1\)
Status Protein1 Protein2 d d1/2
Query variables 1TCD_1 5URW_1 95 75.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]