CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4KRL_1 6TZI_1 5UGS_1 Letter Amino acid
8 6 14 R Arginine
3 28 8 N Asparagine
17 22 31 G Glycine
2 10 11 M Methionine
2 24 14 P Proline
7 26 18 V Valine
12 20 36 A Alanine
7 17 11 E Glutamic acid
3 15 7 F Phenylalanine
2 0 1 C Cysteine
6 24 10 Q Glutamine
15 19 21 S Serine
10 23 13 T Threonine
4 3 4 W Tryptophan
8 20 6 Y Tyrosine
6 16 15 D Aspartic acid
3 19 20 I Isoleucine
7 29 28 L Leucine
5 36 9 K Lycine
6 4 12 H Histidine

4KRL_1|Chain A[auth B]|Nanobody/VHH domain 7D12|Lama glama (9844)
>6TZI_1|Chains A, B, C, D|Beta-lactamase|Acinetobacter baumannii (470)
>5UGS_1|Chains A, B, C, D, E, F[auth G]|Enoyl-[acyl-carrier-protein] reductase [NADH]|Mycobacterium tuberculosis  (1773)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4KRL , Knot 67 133 0.82 40 103 125
QVKLEESGGGSVQTGGSLRLTCAASGRTSRSYGMGWFRQAPGKEREFVSGISWRGDSTGYADSVKGRFTISRDNAKNTVDLQMNSLKPEDTAIYYCAAAAGSAWYGTLYEYDYWGQGTQVTVSSALEHHHHHH
6TZI , Knot 153 361 0.83 38 209 348
MDNTPKDQEIKKLVDQNFKPLLEKYDVPGMAVGVIQNNKKYEMYYGLQSVQDKKAVNSNTIFELGSVSKLFTATAGGYAKNKGKISFDDTPGKYWKELKNTPIDQVNLLQLATYTSGNLALQFPDEVQTDQQVLTFFKDWKPKNPIGEYRQYSNPSIGLFGKVVALSMNKPFDQVLEKTIFPALGLKHSYVNVPKTQMQNYAFGYNQENQPIRVNPGPLDAPAYGVKSTLPDMLSFIHANLNPQKYPTDIQRAINETHQGRYQVNTMYQALGWEEFSYPATLQTLLDSNSEQIVMKPNKVTAISKEPSVKMYHKTGSTSGFGTYVVFIPKENIGLVMLTNKRIPNEERIKAAYVVLNAIKK
5UGS , Knot 122 289 0.79 40 173 270
MGSSHHHHHHSSGLVPRGSHMTGLLDGKRILVSGIITDSSIAFHIARVAQEQGAQLVLTGFDRLRLIQRITDRLPAKAPLLELDVQNEEHLASLAGRVTEAIGAGNKLDGVVHSIGFMPQTGMGINPFFDAPYADVSKGIHISAYSYASMAKALLPIMNPGGSIVGMDFDPSRAMPAYNWMTVAKSALESVNRFVAREAGKYGVRSNLVAAGPIRTLAMSAIVGGALGEEAGAQIQLLEEGWDQRAPIGWNMKDATPVAKTVCALLSDWLPATTGDIIYADGGAHTQLL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4KRL_1)}(2) \setminus P_{f(6TZI_1)}(2)|=39\), \(|P_{f(6TZI_1)}(2) \setminus P_{f(4KRL_1)}(2)|=145\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0101000111010011010100110100000011111001110000110110101000101001010101000010001010100101000110001111101101010000011010010100110000000
Pair \(Z_2\) Length of longest common subsequence
4KRL_1,6TZI_1 184 3
4KRL_1,5UGS_1 162 6
6TZI_1,5UGS_1 168 4

Newick tree

 
[
	6TZI_1:90.33,
	[
		4KRL_1:81,5UGS_1:81
	]:9.33
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{494 }{\log_{20} 494}-\frac{133}{\log_{20}133})=106.\)
Status Protein1 Protein2 d d1/2
Query variables 4KRL_1 6TZI_1 136 92
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]