CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3ACS_1 6SIR_1 5SVW_1 Letter Amino acid
51 12 3 S Serine
80 15 4 A Alanine
55 11 10 D Aspartic acid
29 3 4 Q Glutamine
31 13 14 I Isoleucine
46 6 4 Y Tyrosine
68 16 10 V Valine
5 1 4 C Cysteine
56 11 11 L Leucine
21 14 3 K Lycine
35 6 8 F Phenylalanine
10 7 3 M Methionine
53 20 8 T Threonine
16 4 0 W Tryptophan
47 6 10 R Arginine
51 10 9 E Glutamic acid
80 15 13 G Glycine
26 4 3 H Histidine
37 7 5 N Asparagine
45 6 11 P Proline

3ACS_1|Chains A, B|Cellobiose Phosphorylase|Cellvibrio gilvus (593907)
>6SIR_1|Chains A, B, C, D|Nucleotide cyclase|Catenaria anguillulae PL171 (765915)
>5SVW_1|Chains A, B, C, D|Adagio protein 1|Arabidopsis thaliana (3702)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3ACS , Knot 319 842 0.85 40 297 759
MGSSHHHHHHSSGLVPRGSHMRYGHFDDAAREYVITTPHTPYPWINYLGSEQFFSLLSHQAGGYSFYRDAKMRRLTRYRYNNIPADAGGRYLYVNDGGDVWTPSWLPVKADLDHFEARHGLGYSRITGERNGLKVETLFFVPLGENAEVQKVTVTNTSDAPKTATLFSFVEFCLWNAQDDQTNYQRNLSIGEVEVEQDGPHGSAIYHKTEYRERRDHYAVFGVNTRADGFDTDRDTFVGAYNSLGEASVPRAGKSADSVASGWYPIGSHSVAVTLQPGESRDLVYVLGYLENPDEEKWADDAHQVVNKAPAHALLGRFATSEQVDAALEALNSYWTNLLSTYSVSSTDEKLDRMVNIWNQYQCMVTFNMSRSASFFETGIGRGMGFRDSNQDLLGFVHLIPERARERIIDIASTQFADGSAYHQYQPLTKRGNNDIGSGFNDDPLWLIAGVAAYIKESGDWGILDEPVPFDNEPGSEVPLFEHLTRSFQFTVQNRGPHGLPLIGRADFNDCLNLNCFSTTPGESFQTTENQAGGVAESVFIAAQFVLYGAEYATLAERRGLADVATEARKYVDEVRAAVLEHGWDGQWFLRAYDYYGNPVGTDAKPEGKIWIEPQGFAVMAGIGVGEGPDDADAPAVKALDSVNEMLGTPHGLVLQYPAYTTYQIELGEVSTYPPGYKENGGIFCHNNPWVIIAETVVGRGAQAFDYYKRITPAYREDISDTHKLEPYVYAQMIAGKEAVRAGEAKNSWLTGTAAWNFVAVSQYLLGVRPDYDGLVVDPQIGPDVPSYTVTRVARGATYEITVTNSGAPGARASLTVDGAPVDGRTVPYAPAGSTVRVEVTV
6SIR , Knot 89 187 0.83 40 137 181
GPATEAKEYESVTVFFSDITNFTVISSRTSTKDMMATLNKLWLEYDAIAKRWGVYKVETIGDAYLGVTGAPEVVPDHADRAVNFALDIIEMIKTFKTATGESINIRIGLNSGPVTAGVLGDLNPHWCLVGDTVNTASRMESTSKAGHIHISDSTYQMIKGKFVTQPLDLMEVKGKGKMQTYWVTARK
5SVW , Knot 68 137 0.81 38 116 135
GGPIPYPVGNLLHTAPCGFIVTDAVEPDQPIIYVNTVFEMVTGYRAEEVLGRNCRFLQCRGPFAKRRHPLVDSMVVSEIRKCIDEGIEFQGELLNFRKDGSPLMNRLRLTPIYGDDDTITHIIGIQFFIETDIDLGP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3ACS_1)}(2) \setminus P_{f(6SIR_1)}(2)|=177\), \(|P_{f(6SIR_1)}(2) \setminus P_{f(3ACS_1)}(2)|=17\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11000000000011110100100101001100011001001011100110001101100011100100010100100000001110111001010011011010111101010010100111000101000110100111111100101001010000011001011011010110100000000001011010100011010110000000000000111110001011000000111100011010110110010011011011100011101011000011011101001000011001001100111011110110000101110110001001100001000000100110110000011010100010110011101111000000111110111001000110110001101010000011000100011011000111111111101000101111001111000110011110010001010100011011111101010001010010001100100000011111001111101110110010110001110110010001001011110011010111010000101110010101011101011111111111011001011110110010011101011110011000001011010001110000111100001111110011101101100000101100001000001010101011110011011010001101011101111000111101000111101011101100010011011000101000111110101010111101001101111001010101
Pair \(Z_2\) Length of longest common subsequence
3ACS_1,6SIR_1 194 4
3ACS_1,5SVW_1 215 3
6SIR_1,5SVW_1 159 3

Newick tree

 
[
	3ACS_1:10.94,
	[
		6SIR_1:79.5,5SVW_1:79.5
	]:29.44
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1029 }{\log_{20} 1029}-\frac{187}{\log_{20}187})=229.\)
Status Protein1 Protein2 d d1/2
Query variables 3ACS_1 6SIR_1 287 174.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]