CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9KSO_1 8VNJ_1 7KAE_1 Letter Amino acid
17 0 5 D Aspartic acid
9 0 2 M Methionine
28 0 1 F Phenylalanine
27 0 0 P Proline
20 0 5 Q Glutamine
34 0 6 E Glutamic acid
35 4 3 G Glycine
58 0 8 L Leucine
31 0 3 K Lycine
25 7 4 T Threonine
18 6 8 A Alanine
16 0 3 N Asparagine
8 4 0 C Cysteine
29 0 2 S Serine
5 0 1 W Tryptophan
12 0 1 Y Tyrosine
26 0 0 V Valine
13 0 3 R Arginine
14 0 8 H Histidine
29 0 2 I Isoleucine

9KSO_1|Chain A|UDP-glycosyltransferase 79B30-like|Cicer arietinum (3827)
>8VNJ_1|Chains A[auth C], B[auth D]|DNA (5'-D(*TP*TP*GP*AP*CP*TP*CP*TP*CP*TP*TP*AP*AP*GP*AP*GP*AP*GP*TP*CP*A)-3')|synthetic construct (32630)
>7KAE_1|Chains A, B|Regulatory protein rop|Escherichia coli (562)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9KSO , Knot 189 454 0.85 40 242 435
SSLHIAMFPWFAMGHLIPYLHLSNKLAKRGHKISFFTPKKTQTKLEQFNLYPNLITFYPLNVPHVDGLPFGAETTSDVAISLGPILMTAMDQTQNQIELLLTQLKPQIIFFDFVFWLPKITQRLGIKSFLYFIINPATISYTTSPPRMFEAENLTEVDLMKPPKGYPTSFNLQSHEAKHLASTRKIEFGSGIPFSVRSYNCLSLTDAIGFKGCREIEGPYVDYLQEQFGKPVLLSGPVLPEQSKTALDEKWGSWLGGFKDGSLVYCALGSELKLKQDQFHELLLGLELTGFPFLAILKPPVGFETIEDALPEGFKERVKEKGIVHSGWIQQQLILEHPSVGCFVTHCGAGSITEGLVNNCQMVLLPQLNGDYIINARIMGRHLKVGVEVKKGEEDGLFTKESVYEAVKIVMDDENEIGREVRSNHTKVRNLLLRHDLESSCLDTFCEKLQELVS
8VNJ , Knot 10 21 0.48 8 11 15
TTGACTCTCTTAAGAGAGTCA
7KAE , Knot 36 65 0.77 34 51 59
MGHHHHHHGTKQEKTALNMARFIQNQTQTLLEKLNELDADEQADIAESLHDHADELYRSALARWG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9KSO_1)}(2) \setminus P_{f(8VNJ_1)}(2)|=236\), \(|P_{f(8VNJ_1)}(2) \setminus P_{f(9KSO_1)}(2)|=5\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0010111111111101110101000110010010110100000010010101011010110110101111110000011101111110110000001011100101011110111111010001110011011101101000001101101001001011011010100101000010011000010110111101000001010011110100010110100100011011110111110000011000110111110010110011100101000010011111010111111110111110010011101100010001110011100011100101101100011101001110000111110101001101011100101110100100011100001001101110000011001000000100111000100001001000100110
Pair \(Z_2\) Length of longest common subsequence
9KSO_1,8VNJ_1 241 3
9KSO_1,7KAE_1 221 4
8VNJ_1,7KAE_1 58 2

Newick tree

 
[
	9KSO_1:13.43,
	[
		7KAE_1:29,8VNJ_1:29
	]:10.43
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{475 }{\log_{20} 475}-\frac{21}{\log_{20}21})=142.\)
Status Protein1 Protein2 d d1/2
Query variables 9KSO_1 8VNJ_1 184 96
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]