CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7KDD_1 2LMF_1 8UZU_1 Letter Amino acid
48 0 5 Y Tyrosine
46 1 9 D Aspartic acid
48 2 9 G Glycine
46 2 10 I Isoleucine
71 2 7 L Leucine
60 0 13 A Alanine
37 1 4 Q Glutamine
19 0 5 M Methionine
87 1 7 S Serine
68 0 6 T Threonine
8 0 0 W Tryptophan
75 1 6 V Valine
52 3 6 R Arginine
54 0 4 N Asparagine
33 3 2 F Phenylalanine
29 0 3 P Proline
16 0 0 C Cysteine
48 2 11 E Glutamic acid
23 0 2 H Histidine
39 5 7 K Lycine

7KDD_1|Chains A, D[auth B], G[auth C]|Envelope glycoprotein B|Human cytomegalovirus (strain Towne) (10363)
>2LMF_1|Chain A|Antibacterial protein LL-37|Homo sapiens (9606)
>8UZU_1|Chains A, B, C, D|Group 1 truncated hemoglobin|Shewanella benthica KT99 (314608)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7KDD , Knot 343 907 0.85 40 310 835
MESRIWCLVVCVNLCIVCLGAAVSSSSTRGTSATHSHHSSHTTSAAHSRSGSVSQRVTSSQTVSHGVNETIYNTTLKYGDVVGVNTTKYPYRVCSMAQGTDLIRFERNIVCTSMKPINEDLDEGIMVVYKRNIVAHTFKVRVYQKVLTFRRSYAYIHTTYLLGSNTEYVAPPMWEIHHINSHSQCYSSYSRVIAGTVFVAYHRDSYENKTMQLMPDDYSNTHSTRYVTVKDQWHSRGSTWLYRETCNLNCMVTITTARSKYPYHFFATSTGDVVDISPFYNGTNRNASYFGENADKFFIFPNYTIVSDFGRPNSALETHRLVAFLERADSVISWDIQDEKNVTCQLTFWEASERTIRSEAEDSYHFSSAKMTATFLSKKQEVNMSDSALDCVRDEAINKLQQIFNTSYNQTYEKYGNVSVFETTGGLVVFWQGIKQKSLVELERLANRSSLNLTHNRTKRSTDGNNATHLSNMESVHNLVYAQLQFTYDTLRGYINRALAQIAEAWCVDQRRTLEVFKELSKINPSAILSAIYNKPIAARFMGDVLGLASCVTINQTSVKVLRDMNVKESPGRCYSRPVVIFNFANSSYVQYGQLGEDNEILLGNHRTEECQLPSLKIFIAGNSAYEYVDYLFKRMIDLSSISTVDSMIALDIDPLENTDFRVLELYSQKELRSSNVFDLEEIMREFNSYKQRVKYVEDKVVDPLPPYLKGLDDLMSGLGAAGKAVGVAIGAVGGAVASVVEGVATFLKNPFGAFTIILVAIAVVIIIYLIYTRQRRLCMQPLQNLFPYLVSADGTTVTSGNTKDTSLQAPPSYEESVYNSGRKGPGPPSSDASTAAPPYTNEQAYQMLLALVRLDAEQRAQQNGTDSLDGQTGTQDKGQKPNLLDRLRHRKNGYRHLKDSDEEENV
2LMF , Knot 15 23 0.68 22 21 21
LLGDFFRKSKEKIGKEFKRIVQR
8UZU , Knot 62 116 0.84 36 99 114
SLYERLGGEQKIARIAADIFDTHATNPTVASRYKDSDRERVIKMVTEFLSAGTGGPQDYTGKSMPEAHRSMNINEAEYAAVIDDIMVALDKNEVGDQEKQELLMIAYSLKGEIIGA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7KDD_1)}(2) \setminus P_{f(2LMF_1)}(2)|=289\), \(|P_{f(2LMF_1)}(2) \setminus P_{f(7KDD_1)}(2)|=0\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1000110111010101101111100000010010000000000011000010100010000010011000100001001011110000010010011010011010001100010110001001111100001110010101000110100001010000111000001111110100100000000000011110111100000000001011100000000000101000100010011000000100110100100001001110001011010110010000100110010011111000110011010011000011111001001101010000010001011010000100010000010010101011000001010001100100011001001100000000000101011000111111101100001101001100001010000000000100100100100100110101010000101010011101101101000001011001001010111011000111101110111110010100001011001010001100000111110110000100101100001111000000001101011111001000100110011010010010011110101100001011010000010000110100110010000001001000110111101011001101111110111111111111110110111011001111101111111111110110000001010110011101101010010010000001011100000100010011111000100111100000100111111010100010001000101001000010010110010000010001000000001
Pair \(Z_2\) Length of longest common subsequence
7KDD_1,2LMF_1 289 3
7KDD_1,8UZU_1 219 4
2LMF_1,8UZU_1 100 2

Newick tree

 
[
	7KDD_1:14.19,
	[
		8UZU_1:50,2LMF_1:50
	]:95.19
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{930 }{\log_{20} 930}-\frac{23}{\log_{20}23})=262.\)
Status Protein1 Protein2 d d1/2
Query variables 7KDD_1 2LMF_1 331 169
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]