CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4LNS_1 2AWE_1 7UDP_1 Letter Amino acid
24 0 33 A Alanine
3 0 9 C Cysteine
9 0 15 M Methionine
19 0 27 R Arginine
15 0 18 Q Glutamine
11 0 11 H Histidine
17 0 21 I Isoleucine
14 0 13 T Threonine
6 0 2 W Tryptophan
13 0 16 N Asparagine
21 0 22 D Aspartic acid
21 3 17 G Glycine
11 0 24 F Phenylalanine
32 0 20 V Valine
25 0 32 E Glutamic acid
39 0 42 L Leucine
22 0 20 K Lycine
15 0 19 P Proline
24 0 20 S Serine
10 0 12 Y Tyrosine

4LNS_1|Chain A|Asparagine synthetase a|Trypanosoma brucei brucei (185431)
>2AWE_1|Chains A, B, C, D, E, F, G, H|5'-R(*UP*(BGM)P*GP*UP*GP*U)-3'|
>7UDP_1|Chain A|Atypical kinase COQ8A, mitochondrial|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4LNS , Knot 153 351 0.85 40 206 331
MGDDGYSSYVLLQEQILTVKRSFSEALEKELNLVEVRAPILFRVGDGTQDNLSGFEKAVQVPVKAIPNASFEVVHSLAKWKRRTLANYKFAPGHGLYTHMTALRVDDVLDNIHSVVVDQWDWEMVMKDDQRNLAFLKEVVCKVYAAIRKTELAVCEKYKQKPILPETIQFVHAEHLLLAYPNLTAKEREREIAREYGAVFLIGIGAVLSSGDRHDARAPDYDDWTSPVEASQVVFPRTSKPIPTMNSLSSLKGLNGDILLYNPTLDDSLEVSSMGIRVNAEALRHQISLTGDDSLLKSEWHQQLLNGEFPQTVGGGIGQSRMVMFMLRKKHIGEVQCSVWPEEIRKKHNLL
2AWE , Knot 4 6 0.39 4 3 4
UGGUGU
7UDP , Knot 172 393 0.87 40 230 382
SSEANAERIVRTLCKVRGAALKLGQMLSIQDDAFINPHLAKIFERVRQSADFMPLKQMMKTLNNDLGPNWRDKLEYFEERPFAAASIGQVHLARMKGGREVAMKIQYPGVAQSINSDVNNLMAVLNMSNMLPEGLFPEHLIDVLRRELALECDYQREAACARKFRDLLKGHPFFYVPEIVDELCSPHVLTTELVSGFPLDQAEGLSQEIRNEICYNILVLCLRELFEFHFMQTDPNWSNFFYDPQQHKVALLDFGATREYDRSFTDLYIQIIRAAADRDRETVRAKSIEMKFLTGYEVKVMEDAHLDAILILGEAFASDEPFDFGTQSTTEKIHNLIPVMLRHRLVPPPEETYSLHRKMGGSFLICSKLKARFPCKAMFEEAYSNYCKRQAQQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4LNS_1)}(2) \setminus P_{f(2AWE_1)}(2)|=205\), \(|P_{f(2AWE_1)}(2) \setminus P_{f(4LNS_1)}(2)|=2\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110010000111000110100010011000101101011111011010000101100110111011101010110011010000110001111011000101101001100100111001010111000000111100110010111000011100000001111001011010011110101010000001100011111111111100100001011000010011010011110000111010010010110101110010100010100111010101100010101000110001000110101100111111000111111000011010001110010000011
Pair \(Z_2\) Length of longest common subsequence
4LNS_1,2AWE_1 207 2
4LNS_1,7UDP_1 164 3
2AWE_1,7UDP_1 231 2

Newick tree

 
[
	2AWE_1:11.44,
	[
		4LNS_1:82,7UDP_1:82
	]:35.44
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{357 }{\log_{20} 357}-\frac{6}{\log_{20}6})=116.\)
Status Protein1 Protein2 d d1/2
Query variables 4LNS_1 2AWE_1 151 76.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]