CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3DGV_1 6CJW_1 1XTO_1 Letter Amino acid
25 20 21 A Alanine
15 11 16 H Histidine
14 16 19 P Proline
9 4 6 W Tryptophan
23 17 19 R Arginine
16 17 21 D Aspartic acid
27 14 6 K Lycine
41 23 13 S Serine
15 7 16 T Threonine
18 9 15 N Asparagine
7 13 8 C Cysteine
9 5 11 M Methionine
14 14 11 F Phenylalanine
33 22 16 V Valine
11 16 14 Q Glutamine
27 28 15 E Glutamic acid
18 22 32 G Glycine
25 19 13 I Isoleucine
25 30 35 L Leucine
29 9 4 Y Tyrosine

3DGV_1|Chains A, B, C|Carboxypeptidase B2|Bos taurus (9913)
>6CJW_1|Chain A|MAP kinase-interacting serine/threonine-protein kinase 2|Homo sapiens (9606)
>1XTO_1|Chain A|Coenzyme PQQ synthesis protein B|Pseudomonas putida (160488)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3DGV , Knot 172 401 0.85 40 228 381
FQRGQVLSALPRTSRQVQILQNVTTTYKIVLWQPVAAEYIVKGYEVHFFVNASDVSNVKAHLNASRIPFRVLVENVEDLIRQQTSNDTISPRASSSYYEQYHSLNEIYSWIEVMTERYPDMVEKIHIGSSYEKYPLYVLKVSKKEQRAKNAMWIDCGIHAREWISPAFCLWFVGSVTYYYGKEKMHTNLLKHMDFYIMPVVNVDGYDYTWKKDRMWRKNRSLHEKNACVGTDLNRNFASKHWCGEGASSSSCSEIYCGTYPESEPEVKAVADFLRRNIKHIKAYISMHSYSQKIVFPYSYSRSRSKDHEELSLVAREAVFAMENIHRNIRYTHGSGSESLYLAPGGSDDWIYDLGIKYSFTFELRDKGKYGFLLPESYIRPTCSEALVAVAKIASHVVKNV
6CJW , Knot 139 316 0.84 40 204 307
GSTDSFSGRFEDVYQLQEDVLGEGAHARVQTCINLITSQEYAVKIIEKQPGHIRSRVFREVEMLYQCQGHRNVLELIEFFEEEDRFYLVFEKMRGGSILSHIHKRRHFNELEASVVVQDVASALDFLHNKGIAHRDLKPENILCEHPNQVSPVKICDFGLGSGIKLNGDCSPISTPELLTPCGSAEYMAPEVVEAFSEEASIYDKRCDLWSLGVILYILLSGYPPFVGRCGSDCGWDRGEACPACQNMLFESIQEGKYEFPDKDWAHISCAAKDLISKLLVRDAKQRLSAAQVLQHPWVQGCAPENTLPTPMVLQR
1XTO , Knot 136 311 0.83 40 193 292
MYIQVLGSAAGGGFPQWNCNCVNCKGYRDGTLKATARTQSSIALSDDGVHWILCNASPDIRAQLQAFAPMQPARALRDTGINAIVLLDSQIDHTTGLLSLREGCPHQVWCTDMVHQDLTTGFPLFNMLSHWNGGLQWNRIELEGSFVIDACPNLKFTPFPLRSAAPPYSPHRFDPHPGDNLGLMVEDTRTGGKLFYAPGLGQVDEKLLAMMHGADCLLVDGTLWEDDEMQRRGVGTRTGREMGHLAQNGPGGMLEVLDGFPRQRKVLIHINNTNPILDENSPERAEVLRRGVEVAFDGMSIELLEHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3DGV_1)}(2) \setminus P_{f(6CJW_1)}(2)|=101\), \(|P_{f(6CJW_1)}(2) \setminus P_{f(3DGV_1)}(2)|=77\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10010110111000001011001000001111011110011010010111010010010101010011101110010011000000001010100000000001001001101100001011001011000000110110100000010011110011010011011101111101000010001000110010101111101010000100001100000100001011001000110001010110000000100100100010101110110001001010101000000111100000000000001011100111110010001000010100010111110001100111000101010001001111100010100001111110110011001
Pair \(Z_2\) Length of longest common subsequence
3DGV_1,6CJW_1 178 4
3DGV_1,1XTO_1 189 3
6CJW_1,1XTO_1 181 3

Newick tree

 
[
	1XTO_1:93.66,
	[
		3DGV_1:89,6CJW_1:89
	]:4.66
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{717 }{\log_{20} 717}-\frac{316}{\log_{20}316})=110.\)
Status Protein1 Protein2 d d1/2
Query variables 3DGV_1 6CJW_1 141 125
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]