CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5GMM_1 1XCD_1 7UBJ_1 Letter Amino acid
11 9 3 H Histidine
20 41 10 L Leucine
18 27 13 K Lycine
17 23 5 P Proline
30 22 7 S Serine
17 22 13 V Valine
9 12 5 Q Glutamine
16 22 10 G Glycine
10 21 5 I Isoleucine
6 1 2 W Tryptophan
17 29 5 N Asparagine
13 19 5 E Glutamic acid
1 6 6 C Cysteine
11 10 2 F Phenylalanine
8 9 7 Y Tyrosine
19 16 19 A Alanine
7 9 11 R Arginine
14 14 10 T Threonine
14 13 6 D Aspartic acid
3 4 3 M Methionine

5GMM_1|Chains A, B|Carbonic anhydrase 1|Homo sapiens (9606)
>1XCD_1|Chain A|Decorin|Bos taurus (9913)
>7UBJ_1|Chains A, B|Antitermination protein Q|Escherichia phage Lambda (2681611)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5GMM , Knot 121 261 0.86 40 179 253
MASPDWGYDDKNGPEQWSKLYPIANGNNQSPVDIKTSETKHDTSLKPISVSYNPATAKEIINVGHSFHVNFEDNDNRSVLKGGPFSDSYRLFQFHFHWGSTNEHGSEHTVDGVKYSAELHVAHWNSAKYSSLAEAASKADGLAVIGVLMKVGEANPKLQKVLDALQAIKTKGKRAPFTNFDPSTLLPSSLDFWTYPGSLTHPPLYESVTWIICKESISVSSEQLAQFRSLLSNVEGDNAVPMQHNNRPTQPLKGRTVRASF
1XCD , Knot 136 329 0.79 40 186 303
DEASGIGPEEHFPEVPEIEPMGPVCPFRCQCHLRVVQCSDLGLEKVPKDLPPDTALLDLQNNKITEIKDGDFKNLKNLHTLILINNKISKISPGAFAPLVKLERLYLSKNQLKELPEKMPKTLQELRVHENEITKVRKSVFNGLNQMIVVELGTNPLKSSGIENGAFQGMKKLSYIRIADTNITTIPQGLPPSLTELHLDGNKITKVDAASLKGLNNLAKLGLSFNSISAVDNGSLANTPHLRELHLNNNKLVKVPGGLADHKYIQVVYLHNNNISAIGSNDFCPPGYNTKKASYSGVSLFSNPVQYWEIQPSTFRCVYVRAAVQLGNY
7UBJ , Knot 72 147 0.81 40 117 142
SDKQKAINYLMQFAHKVSGKYRGVAKLEGNTKAKVLQVLATFAYADYCRSAATPGARCRDCHGTGRAVDIAKTKLWGRVVEKECGRCKGVGYSRMPASAAYRAVTMLIPNLTQPTWSRTVKPLYDALVVQCHKEESIADNILNAVTR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5GMM_1)}(2) \setminus P_{f(1XCD_1)}(2)|=77\), \(|P_{f(1XCD_1)}(2) \setminus P_{f(5GMM_1)}(2)|=84\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110101100000110010010111010000110100000000001011010001101001101100101010000000110111100000110101011000001000010110001010110100100001101100101111111110110101010011011011000100111001010011100101100110100111000101110000101000011010011001010011110000010011010010101
Pair \(Z_2\) Length of longest common subsequence
5GMM_1,1XCD_1 161 3
5GMM_1,7UBJ_1 158 3
1XCD_1,7UBJ_1 171 4

Newick tree

 
[
	1XCD_1:84.34,
	[
		5GMM_1:79,7UBJ_1:79
	]:5.34
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{590 }{\log_{20} 590}-\frac{261}{\log_{20}261})=92.8\)
Status Protein1 Protein2 d d1/2
Query variables 5GMM_1 1XCD_1 118 107
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]