CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5ORW_1 1JVR_1 3FZX_1 Letter Amino acid
10 5 0 H Histidine
12 5 11 F Phenylalanine
16 8 10 S Serine
4 2 2 W Tryptophan
20 5 8 R Arginine
10 3 15 D Aspartic acid
2 2 4 C Cysteine
23 3 14 E Glutamic acid
17 5 12 K Lycine
12 4 15 Y Tyrosine
17 9 17 A Alanine
9 7 9 Q Glutamine
13 9 8 I Isoleucine
34 14 9 L Leucine
14 28 9 P Proline
7 4 20 N Asparagine
15 6 16 G Glycine
3 1 11 M Methionine
13 10 15 T Threonine
14 7 13 V Valine

5ORW_1|Chain A|Aurora kinase A|Homo sapiens (9606)
>1JVR_1|Chain A|HUMAN T-CELL LEUKEMIA VIRUS TYPE II MATRIX PROTEIN|Human T-lymphotropic virus 2 (11909)
>3FZX_1|Chain A|Putative exported protein|Bacteroides fragilis (272559)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5ORW , Knot 119 265 0.83 40 166 258
QWALEDFEIGRPLGKGKFGNVYLAREKQSKFILALKVLFKAQLEKAGVEHQLRREVEIQSHLRHPNILRLYGYFHDATRVYLILEYAPLGTVYRELQKLSKFDEQRTATYITELANALSYCHSKRVIHRDIKPENLLLGSAGELKIADFGWSVHAPSSRRTTLAGTLDYLPPEMIEGRMHDEKVDLWSLGVLCYEFLVGKPPFEANTYQETYKRISRVEFTFPDFVTEGARDLISRLLKHNPSQRPMLREVLEHPWITANSSKPS
1JVR , Knot 64 137 0.76 40 95 122
HMGQIHGLSPTPIPKAPRGLSTHHWLNFLQAAYRLQPGPSDFDFQQLRRFLKLALKTPIWLNPIDYSLLASLIPKGYPGRVVEIINILVKNQVSPSAPAAPVPTPICPTTTPPPPPPPSPEAHVPPPYVEPTTTQCF
3FZX , Knot 101 218 0.83 38 161 215
GAQNQDCAFFFPNQEGEQITRNCYTADGKLTNILVYRVDQAYEYPSGMEVVANYTFADAAGKTLNSGQMVARCSDGNFSMSMGDVATFPTALNMMNADVYMMGDLMNYPDAFSNPMNPGDDDEFDDGTLRLYQKGNKNNRAEISVFDREFVTTETVNTPAGAFYCTKVKYEMNIWTPKETIKGYGYEWYAPNIGIVRSEQYNNKKELQSYSVLERIKK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5ORW_1)}(2) \setminus P_{f(1JVR_1)}(2)|=129\), \(|P_{f(1JVR_1)}(2) \setminus P_{f(5ORW_1)}(2)|=58\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0111001011011101011010110000001111101110101001110001000101000100101101010100100101110011110100010010010000010010011011000000011000101001111011010110111010110000001110100111011010100001011011110001111011101000000000100101011011001100110011000100011100110011101000010
Pair \(Z_2\) Length of longest common subsequence
5ORW_1,1JVR_1 187 4
5ORW_1,3FZX_1 171 4
1JVR_1,3FZX_1 176 4

Newick tree

 
[
	1JVR_1:92.48,
	[
		5ORW_1:85.5,3FZX_1:85.5
	]:6.98
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{402 }{\log_{20} 402}-\frac{137}{\log_{20}137})=79.8\)
Status Protein1 Protein2 d d1/2
Query variables 5ORW_1 1JVR_1 103 75
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]