CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4MWV_1 5NQN_1 7PUF_1 Letter Amino acid
18 19 19 A Alanine
29 2 14 N Asparagine
18 18 3 C Cysteine
31 6 15 G Glycine
15 3 23 K Lycine
9 1 10 F Phenylalanine
28 8 26 T Threonine
22 6 26 V Valine
24 11 12 R Arginine
18 13 20 E Glutamic acid
25 6 18 I Isoleucine
18 8 18 L Leucine
38 6 21 S Serine
14 0 7 W Tryptophan
21 7 17 D Aspartic acid
7 7 11 H Histidine
18 0 10 Y Tyrosine
9 5 12 Q Glutamine
5 6 4 M Methionine
21 1 9 P Proline

4MWV_1|Chain A|Neuraminidase|Influenza A virus (11320)
>5NQN_1|Chain A|CSP3|Methylosinus trichosporium OB3b (595536)
>7PUF_1|Chains A, B|Uricase|Aspergillus flavus (5059)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4MWV , Knot 165 388 0.84 40 225 371
RNFNNLTKGLCTINSWHIYGKDNAVRIGESSDVLVTREPYVSCDPDECRFYALSQGTTIRGKHSNGTIHDRSQYRALISWPLSSPPTVYNSRVECIGWSSTSCHDGKSRMSICISGPNNNASAVVWYNRRPVAEINTWARNILRTQESECVCHNGVCPVVFTDGSATGPADTRIYYFKEGKILKWESLTGTAKHIEECSCYGERTGITCTCRDNWQGSNRPVIQIDPVAMTHTSQYICSPVLTDNPRPNDPNIGKCNDPYPGNNNNGVKGFSYLDGANTWLGRTISTASRSGYEMLKVPNALTDDRSKPIQGQTIVLNADWSGYSGSFMDYWAEGDCYRACFYVELIRGRPKEDKVWWTSNSIVSMCSSTEFLGQWNWPDGAKIEYFL
5NQN , Knot 64 133 0.78 36 95 124
MHVEAMISKHPQARGQTDRSLVQCVEMCFDCAQTCAACADACLGEDKVADLRHCIRLNLDCAEICVAAGSIASRAAGTEESILRTMLQTCAEMCRMCEEECRRHAGNHEHCRICADVCKECETACRSATGLTH
7PUF , Knot 134 296 0.85 42 202 287
XSAVKAARYGKDNVRVYKVHKDEKTGVQTVYEMTVCVLLEGEIETSYTKADNSVIVATDSIKNTIYITAKQNPVTPPELFGSILGTHFIEKYNHIHAAHVNIVCHRWTRMDIDGKPHPHSFIRDSEEKRNVQVDVVEGKGIDIKSSLSGLTVLKSTNSQFWGFLRDEYTTLKETWDRILSTDVDATWQWKNFSGLQEVRSHVPKFDATWATAREVTLKTFAEDNSASVQATMYKMAEQILARQQLIETVEYSLPNKHYFEIDLSWHKGLQNTGKNAEVFAPQSDPNGLIKCTVGRS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4MWV_1)}(2) \setminus P_{f(5NQN_1)}(2)|=160\), \(|P_{f(5NQN_1)}(2) \setminus P_{f(4MWV_1)}(2)|=30\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0010010011001001010100011011000011100010100010000101100100101000010100000001110111001101000010011100000001000101010110001011110000111010011001100000001000110111100101011100010010010110100101010010000001000110000000101000111010111100000010011100010100101100001011000011011001011001110010010001001101101100000011010011101010100101100110100001010101101010000111000011010000011101011011010011
Pair \(Z_2\) Length of longest common subsequence
4MWV_1,5NQN_1 190 4
4MWV_1,7PUF_1 173 5
5NQN_1,7PUF_1 193 3

Newick tree

 
[
	5NQN_1:98.64,
	[
		4MWV_1:86.5,7PUF_1:86.5
	]:12.14
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{521 }{\log_{20} 521}-\frac{133}{\log_{20}133})=114.\)
Status Protein1 Protein2 d d1/2
Query variables 4MWV_1 5NQN_1 147 96.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]