CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1MJT_1 1HEU_1 8COP_1 Letter Amino acid
29 25 14 L Leucine
14 26 10 S Serine
23 4 5 Y Tyrosine
17 18 3 F Phenylalanine
16 20 9 P Proline
3 14 1 C Cysteine
9 8 4 Q Glutamine
24 24 7 I Isoleucine
4 9 5 M Methionine
7 2 6 W Tryptophan
21 28 14 A Alanine
13 12 17 R Arginine
18 38 18 G Glycine
13 7 3 H Histidine
28 30 1 K Lycine
21 24 11 T Threonine
17 39 13 V Valine
26 8 1 N Asparagine
18 17 7 D Aspartic acid
26 21 12 E Glutamic acid

1MJT_1|Chains A, B|NITRIC-OXIDE SYNTHASE HOMOLOG|Staphylococcus aureus (1280)
>1HEU_1|Chains A, B|ALCOHOL DEHYDROGENASE E CHAIN|EQUUS CABALLUS (9796)
>8COP_1|Chains A, B|Dihydrofolate reductase|Mycobacterium tuberculosis (1773)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1MJT , Knot 151 347 0.84 40 210 336
HHLFKEAQAFIENMYKECHYETQIINKRLHDIELEIKETGTYTHTEEELIYGAKMAWRNSNRCIGRLFWDSLNVIDARDVTDEASFLSSITYHITQATNEGKLKPYITIYAPKDGPKIFNNQLIRYAGYDNCGDPAEKEVTRLANHLGWKGKGTNFDVLPLIYQLPNESVKFYEYPTSLIKEVPIEHNHYPKLRKLNLKWYAVPIISNMDLKIGGIVYPTAPFNGWYMVTEIGVRNFIDDYRYNLLEKVADAFEFDTLKNNSFNKDRALVELNYAVYHSFKKEGVSIVDHLTAAKQFELFERNEAQQGRQVTGKWSWLAPPLSPTLTSNYHHGYDNTVKDPNFFYKK
1HEU , Knot 158 374 0.83 40 210 358
STAGKVIKCKAAVLWEEKKPFSIEEVEVAPPKAHEVRIKMVATGICRSDDHVVSGTLVTPLPVIAGHEAAGIVESIGEGVTTVRPGDKVIPLFTPQCGKCRVCKHPEGNFCLKNDLSMPRGTMQDGTSRFTCRGKPIHHFLGTSTFSQYTVVDEISVAKIDAASPLEKVCLIGCGFSTGYGSAVKVAKVTQGSTCAVFGLGGVGLSVIMGCKAAGAARIIGVDINKDKFAKAKEVGATECVNPQDYKKPIQEVLTEMSNGGVDFSFEVIGRLDTMVTALSCCQEAYGVSVIVGVPPDSQNLSMNPMLLLSGRTWKGAIFGGFKSKDSVPKLVADFMAKKFALDPLITHVLPFEKINEGFDLLRSGESIRTILTF
8COP , Knot 80 161 0.84 40 111 155
MTMVGLIWAQATSGVIGRGGDIPWRLPEDQAHFREITMGHTIVMGRRTWDSLPAKVRPLPGRRNVVLSRQADFMASGAEVVGSLEEALTSPETWVIGGGQVYALALPYATRCEVTEVDIGLPREAGDALAPVLDETWRGETGEWRFSRSGLRYRLYSYHRS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1MJT_1)}(2) \setminus P_{f(1HEU_1)}(2)|=91\), \(|P_{f(1HEU_1)}(2) \setminus P_{f(1MJT_1)}(2)|=91\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00110010111001000000000110001001010100010000000011011011100000011011100101101001000101100100010010001010101010110011011000110011000010110001001100111010100101111100110001010001001100111000001010010101011111001010111110101110110110011100110000001100110110100100001000011101001100010001101100101100101100001001001010101111110101000000100001001011000
Pair \(Z_2\) Length of longest common subsequence
1MJT_1,1HEU_1 182 4
1MJT_1,8COP_1 189 4
1HEU_1,8COP_1 179 5

Newick tree

 
[
	1MJT_1:93.83,
	[
		1HEU_1:89.5,8COP_1:89.5
	]:4.33
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{721 }{\log_{20} 721}-\frac{347}{\log_{20}347})=102.\)
Status Protein1 Protein2 d d1/2
Query variables 1MJT_1 1HEU_1 130 126
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]