CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2QWM_1 5WRD_1 6SEK_1 Letter Amino acid
29 10 40 L Leucine
32 12 35 V Valine
37 5 28 A Alanine
19 4 17 N Asparagine
24 12 35 E Glutamic acid
6 3 12 H Histidine
22 9 35 S Serine
10 4 16 Y Tyrosine
30 5 27 D Aspartic acid
7 5 19 M Methionine
18 7 36 F Phenylalanine
11 6 26 P Proline
22 10 25 R Arginine
13 7 19 Q Glutamine
1 0 5 W Tryptophan
27 6 33 T Threonine
2 0 9 C Cysteine
29 4 39 G Glycine
26 8 36 I Isoleucine
29 8 41 K Lycine

2QWM_1|Chains A, B|Heat shock cognate 71 kDa protein|Bos taurus (9913)
>5WRD_1|Chains A, B|Microtubule-associated proteins 1A/1B light chain 3B|Mus musculus (10090)
>6SEK_1|Chains A, B|Ancestral Flavin-containing monooxygenase 5|synthetic construct (32630)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2QWM , Knot 168 394 0.85 40 214 374
MSKGPAVGIDLGTTYSCVGVFQHGKVEIIANDQGNRTTPSYVAFTDTERLIGDAAKNQVAMNPTNTVFDAKRLIGRRFDDAVVQSDMKHWPFMVVNDAGRPKVQVEYKGETKSFYPEEVSSMVLTKMKEIAEAYLGKTVTNAVVTVPAYFNDSQRQATKDAGTIAGLNVLRIINEPTAAAIAYGLDKKVGAERNVLIFDLGGGTFDVSILTIEDGIFEVKSTAGDTHLGGEDFDNRMVNHFIAEFKRKHKKDISENKRAVRRLRTACERAKRTLSSSTQASIEIDSLYEGIDFYTSITRARFEELNADLFRGTLDPVEKALRDAKLDKSQIHDIVLVGGSTRIPKIQKLLQDFFNGKELNKSINPDEAVAYGAAVQAAILSGDKSENVQDLLLL
5WRD , Knot 63 125 0.81 36 96 123
MPSEKTFKQRRSFEQRVEDVRLIREQHPTKIPVIIERYKGEKQLPVLDKTKFLVPDHVNMSELIKIIRRRLQLNANQAFFLLVNGHSMVSVSTPISEVYESERDEDGFLYMVYASQETFGTAMAV
6SEK , Knot 222 533 0.87 40 265 507
MTKKRIAVIGAGASGLTSIKCCLEEGLEPVCFERTDDIGGLWRFQENPEEGRASIYKSVIINTSKEMMCFSDYPIPDHYPNFMHNSQVLEYFRMYAKEFDLLKYIQFKTTVCSVKKQPDFSTSGQWEVVTECEGKKEVDVFDGVMVCTGHHTNAHLPLESFPGIEKFKGQYFHSRDYKNPEGFTGKRVIIIGIGNSGGDLAVEISHTAKQVFLSTRRGAWILNRVGDHGYPFDVLFSSRFTYFLSKICGQSLSNTFLEKKMNQRFDHEMFGLKPKHRALSQHPTVNDDLPNRIISGLVKVKGNVKEFTETAAIFEDGSREDDIDAVIFATGYSFAFPFLEDSVKVVKNKVSLYKKVFPPNLEKPTLAIIGLIQPLGAIMPISELQGRWATQVFKGLKTLPSQSEMMAEISKAQEEMAKRYVDSQRHTIQGDYIDTMEEIADLVGVRPNLLSLAFTDPKLALKLFFGPCTPVQYRLQGPGKWDGARKTILTTEDRIRKPLMTRVIEKSNSMTSTMTMGRFMLAVVFFAIIMAYF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2QWM_1)}(2) \setminus P_{f(5WRD_1)}(2)|=148\), \(|P_{f(5WRD_1)}(2) \setminus P_{f(2QWM_1)}(2)|=30\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1001111110110000011110010101110001000010011100000111011000111010001101001110010011100010011111100110101010001000010100100111001001101011001001110111010000001000110111101101100101111101100011100011110111101010110100111010001100011100100011001110100000001000001100100100010001000001010100100110100010010100101011010101100110010100001001111110001101001100110100100010100111011110111101000001001111
Pair \(Z_2\) Length of longest common subsequence
2QWM_1,5WRD_1 178 4
2QWM_1,6SEK_1 145 4
5WRD_1,6SEK_1 207 4

Newick tree

 
[
	5WRD_1:10.29,
	[
		2QWM_1:72.5,6SEK_1:72.5
	]:30.79
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{519 }{\log_{20} 519}-\frac{125}{\log_{20}125})=116.\)
Status Protein1 Protein2 d d1/2
Query variables 2QWM_1 5WRD_1 150 96.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]