CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5OBR_1 1DDM_1 1QMH_1 Letter Amino acid
21 15 24 R Arginine
9 5 14 Q Glutamine
18 5 7 K Lycine
4 2 5 M Methionine
14 2 17 P Proline
7 1 8 N Asparagine
23 11 24 E Glutamic acid
17 11 37 G Glycine
34 10 37 L Leucine
4 2 1 W Tryptophan
16 9 36 A Alanine
10 8 8 D Aspartic acid
13 3 17 I Isoleucine
18 10 22 S Serine
13 5 22 T Threonine
12 2 4 Y Tyrosine
3 9 3 C Cysteine
10 5 14 H Histidine
12 6 13 F Phenylalanine
14 14 34 V Valine

5OBR_1|Chain A|Aurora kinase A|Homo sapiens (9606)
>1DDM_1|Chain A|NUMB PROTEIN|Drosophila melanogaster (7227)
>1QMH_1|Chains A, B|RNA 3'-TERMINAL PHOSPHATE CYCLASE|ESCHERICHIA COLI (83333)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5OBR , Knot 124 272 0.85 40 170 264
GSMGSKRQWALEDFEIGRPLGKGKFGNVYLAREKQSKFILALKVLFKAQLEKAGVEHQLRREVEIQSHLRHPNILRLYGYFHDATRVYLILEYAPLGTVYRELQKLSKFDEQRTATYITELANALSYCHSKRVIHRDIKPENLLLGSAGELKIADFGWSVHAPSSRRTTLCGTLDYLPPEMIEGRMHDEKVDLWSLGVLCYEFLVGKPPFEANTYQETYKRISRVEFTFPDFVTEGARDLISRLLKHNPSQRPMLREVLEHPWITANSSKPS
1DDM , Knot 69 135 0.83 40 107 132
HQWQADEEAVRSATCSFSVKYLGCVEVFESRGMQVCEEALKVLRQSRRRPVRGLLHVSGDGLRVVDDETKGLIVDQTIEKVSFCAPDRNHERGFSYICRDGTTRRWMCHGFLACKDSGERLSHAVGCAFAVCLER
1QMH , Knot 144 347 0.81 40 175 324
MVKRMIALDGAQGEGGGQILRSALSLSMITGQPFTITSIRAGRAKPGLLRQHLTAVKAATEICGATVEGAELGSQRLLFRPGTVRGGDYRFAIGSAGSCTLVLQTVLPALWFADGPSRVEVSGGTDNPSAPPADFIRRVLEPLLAKIGIHQQTTLLRHGFYPAGGGVVATEVSPVASFNTLQLGERGNIVQMRGEVLLAGVPRHVAEREIATLAGSFSLHEQNIHNLPRDQGPGNTVSLEVESENITERFFVVGEKRVSAEVVAAQLVKEVKRYLASTAAVGEYLADQLVLPMALAGAGEFTVAHPSCHLLTNIAVVERFLPVRFSLIETDGVTRVSIEGSHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5OBR_1)}(2) \setminus P_{f(1DDM_1)}(2)|=121\), \(|P_{f(1DDM_1)}(2) \setminus P_{f(5OBR_1)}(2)|=58\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10110000111001011011101011010110000001111101110101001110001000101000100101101010100100101110011110100010010010000010010011011000000011000101001111011010110111010110000001010100111011010100001011011110001111011101000000000100101011011001100110011000100011100110011101000010
Pair \(Z_2\) Length of longest common subsequence
5OBR_1,1DDM_1 179 5
5OBR_1,1QMH_1 151 4
1DDM_1,1QMH_1 158 3

Newick tree

 
[
	1DDM_1:87.18,
	[
		5OBR_1:75.5,1QMH_1:75.5
	]:11.68
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{407 }{\log_{20} 407}-\frac{135}{\log_{20}135})=81.9\)
Status Protein1 Protein2 d d1/2
Query variables 5OBR_1 1DDM_1 105 77.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]