CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5ECQ_1 2KYE_1 3AVX_1 Letter Amino acid
28 0 41 N Asparagine
28 0 76 D Aspartic acid
20 0 29 Q Glutamine
22 0 36 Y Tyrosine
46 0 97 V Valine
26 0 75 R Arginine
10 0 32 H Histidine
37 0 85 I Isoleucine
30 0 75 K Lycine
11 3 20 C Cysteine
40 0 97 E Glutamic acid
36 6 113 G Glycine
49 0 104 L Leucine
12 0 28 M Methionine
32 0 52 F Phenylalanine
36 0 71 T Threonine
4 0 8 W Tryptophan
34 2 116 A Alanine
31 0 54 P Proline
43 0 80 S Serine

5ECQ_1|Chains A, D|Jasmonic acid-amido synthetase JAR1|Arabidopsis thaliana (3702)
>2KYE_1|Chain A|RNA (5'-R(*GP*AP*GP*AP*GP*(PSU)P*(PSU)P*GP*GP*GP*CP*(PSU)P*CP*(PSU)P*C)-3')|
>3AVX_1|Chain A|Elongation factor Ts, Elongation factor Tu, LINKER, Q beta replicase|Escherichia coli O157:H7 (83334)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5ECQ , Knot 234 575 0.86 40 274 538
MLEKVETFDMNRVIDEFDEMTRNAHQVQKQTLKEILLKNQSAIYLQNCGLNGNATDPEEAFKSMVPLVTDVELEPYIKRMVDGDTSPILTGHPVPAISLSSGTSQGRPKFIPFTDELMENTLQLFRTAFAFRNRDFPIDDNGKALQFIFSSKQYISTGGVPVGTATTNVYRNPNFKAGMKSITSPSCSPDEVIFSPDVHQALYCHLLSGILFRDQVQYVFAVFAHGLVHAFRTFEQVWEEIVTDIKDGVLSNRITVPSVRTAMSKLLTPNPELAETIRTKCMSLSNWYGLIPALFPNAKYVYGIMTGSMEPYVPKLRHYAGDLPLVSHDYGSSEGWIAANVTPRLSPEEATFAVIPNLGYFEFLPVSETGEGEEKPVGLTQVKIGEEYEVVITNYAGLYRYRLGDVVKVIGFYNNTPQLKFICRRNLILSINIDKNTERDLQLSVESAAKRLSEEKIEVIDFSSYIDVSTDPGHYAIFWEISGETNEDVLQDCCNCLDRAFIDAGYVSSRKCKTIGALELRVVAKGTFRKIQEHFLGLGSSAGQFKMPRCVKPSNAKVLQILCENVVSSYFSTAF
2KYE , Knot 7 15 0.42 8 9 11
GAGAGUUGGGCUCUC
3AVX , Knot 456 1289 0.84 40 336 1104
MAEITASLVKELRERTGAGMMDCKKALTEANGDIELAIENMRKSGAIKAAKKAGNVAADGVIKTKIDGNYGIILEVNCQTDFVAKDAGFQAFADKVLDAAVAGKITDVEVLKAQFEEERVALVAKIGENINIRRVAALEGDVLGSYQHGARIGVLVAAKGADEELVKHIAMHVAASKPEFIKPEDVSAEVVEKEYQVQLDIAMQSGKPKEIAEKMVEGRMKKFTGEVSLTGQPFVMEPSKTVGQLLKEHNAEVTGFIRFEVGEGIEKVETDFAAEVAAMSKQSHMSKEKFERTKPHVNVGTIGHVDHGKTTLTAAITTVLAKTYGGAARAFDQIDNAPEEKARGITINTSHVEYDTPTRHYAHVDCPGHADYVKNMITGAAQMDGAILVVAATDGPMPQTREHILLGRQVGVPYIIVFLNKCDMVDDEELLELVEMEVRELLSQYDFPGDDTPIVRGSALKALEGDAEWEAKILELAGFLDSYIPEPERAIDKPFLLPIEDVFSISGRGTVVTGRVERGIIKVGEEVEIVGIKETQKSTCTGVEMFRKLLDEGRAGENVGVLLRGIKREEIERGQVLAKPGTIKPHTKFESEVYILSKDEGGRHTPFFKGYRPQFYFRTTDVTGTIELPEGVEMVMPGDNIKMVVTLIHPIAMDDGLRFAIREGGRTVGAGVVAKVLSGASGAAGGGGSGGGGSMSKTASSRNSLSAQLRRAANTRIEVEGNLALSIANDLLLAYGQSPFNSEAECISFSPRFDGTPDDFRINYLKAEIMSKYDDFSLGIDTEAVAWEKFLAAEAECALTNARLYRPDYSEDFNFSLGESCIHMARRKIAKLIGDVPSVEGMLRHCRFSGGATTTNNRSYGHPSFKFALPQACTPRALKYVLALRASTHFDIRISDISPFNKAVTVPKNSKTDRCIAIEPGWNMFFQLGIGGILRDRLRCWGIDLNDQTINQRRAHEGSVTNNLATVDLSAASDSISLALCELLLPPGWFEVLMDLRSPKGRLPDGSVVTYEKISSMGNGYTFELESLIFASLARSVCEILDLDSSEVTVYGDDIILPSCAVPALREVFKYVGFTTNTKKTFSEGPFRESCGKHYYSGVDVTPFYIRHRIVSPADLILVLNNLYRWATIDGVWDPRAHSVYLKYRKLLPKQLQRNTIPDGYGDGALVGSVLINPFAKNRGWIRYVPVITDHTRDRERAELGSYLYDLFSRCLSESNDGLPLRGPSGCDSADLFAIDQLICRSNPTKISRSTGKFDIQYIACSSRVLAPYGVFQGTKVASLHEAHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5ECQ_1)}(2) \setminus P_{f(2KYE_1)}(2)|=271\), \(|P_{f(2KYE_1)}(2) \setminus P_{f(5ECQ_1)}(2)|=6\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11001001010011001001000100100001001110000110100011010100100110011111001010101001101000111010111110100100010101111000110001011001111000011100010110111000001001111110100010001010111001001000100111010100110001101111000100111111011101100100110011001001110001011010011001101010110010000101001011111111010010111010101011010001101111000010001111101010101001011111011010111100010100011110010110000111000111000011011011110000101011000011101010000000101010011001000010110100010100011001111010100000110000001001110110100000001111010111010100100011111001101011001010010110110001100010011
Pair \(Z_2\) Length of longest common subsequence
5ECQ_1,2KYE_1 277 2
5ECQ_1,3AVX_1 108 5
2KYE_1,3AVX_1 337 3

Newick tree

 
[
	2KYE_1:17.34,
	[
		5ECQ_1:54,3AVX_1:54
	]:12.34
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{590 }{\log_{20} 590}-\frac{15}{\log_{20}15})=177.\)
Status Protein1 Protein2 d d1/2
Query variables 5ECQ_1 2KYE_1 230 118
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]