CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2JQX_1 5BMX_1 7NCO_1 Letter Amino acid
51 14 16 G Glycine
34 18 16 S Serine
44 8 16 R Arginine
6 4 0 C Cysteine
17 5 12 H Histidine
44 19 11 I Isoleucine
32 22 15 K Lycine
19 13 8 F Phenylalanine
44 14 11 D Aspartic acid
40 12 6 Q Glutamine
69 19 26 L Leucine
23 2 7 M Methionine
31 4 14 P Proline
33 12 16 T Threonine
12 2 7 W Tryptophan
18 7 9 Y Tyrosine
42 18 8 N Asparagine
45 19 21 E Glutamic acid
46 22 16 V Valine
73 14 18 A Alanine

2JQX_1|Chain A|Malate synthase G|Escherichia coli (562)
>5BMX_1|Chains A, B, C, D|Triosephosphate isomerase|Plasmodium falciparum (5833)
>7NCO_1|Chains A, B, C, D, E, F|Glutathione S-transferase GliG|Aspergillus fumigatus A1163 (451804)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2JQX , Knot 282 723 0.85 40 292 663
MAQTITQSRLRIDANFKRFVDEEVLPGTGLDAAAFWRNFDEIVHDLAPENRQLLAERDRIQAALDEWHRSNPGPVKDKAAYKSFLRELGYLVPQPERVTVETTGIDSEITSQAGPQLVVPAMNARYALNAANARWGSLYDALYGSDIIPQEGAMVSGYDPQRGEQVIAWVRRFLDESLPLENGSYQDVVAFKVVDKQLRIQLKNGKETTLRTPAQFVGYRGDAAAPTCILLKNNGLHIELQIDANGRIGKDDPAHINDVIVEAAISTILDCEDSVAAVDAEDKILLYRNLLGLMQGTLQEKMEKNGRQIVRKLNDDRHYTAADGSEISLHGRSLLFIRNVGHLMTIPVIWDSEGNEIPEGILDGVMTGAIALYDLKVQKNSRTGSVYIVKPKMHGPQEVAFANKLFTRIETMLGMAPNTLKMGIMDEERRTSLNLRSCIAQARNRVAFINTGFLDRTGDEMHSVMEAGPMLRKNQMKSTPWIKAYERNNVLSGLFCGLRGKAQIGKGMWAMPDLMADMYSQKGDQLRAGANTAWVPSPTAATLHALHYHQTNVQSVQANIAQTEFNAEFEPLLDDLLTIPVAENANWSAQEIQQELDNNVQGILGYVVRWVEQGIGCSKVPDIHNVALMEDRATLRISSQHIANWLRHGILTKEQVQASLENMAKVVDQQNAGDPAYRPMAGNFANSCAFKAASDLIFLGVKQPNGYTEPLLHAWRLREKESH
5BMX , Knot 115 248 0.85 40 166 238
MARKYFVAANWKCNGTLESIKSLTNSFNNLDFDPSKLDVVVFPVSVHYDHTRKLLQSKFSTGIQNVSKFGNGSYNGEVSAEIAKDLNIEYVIIGHFERRKYFHETDEDVREKLQASLKNNLKAVVCFGESLEQREQNKTIEVITKQVKAFVDLIDNFDNVILVYEPLWAIGTGKTATPEQAQLVHKEIRKIVKDTCGEKQANQIRILYGGSVNTENCSSLIQQEDIDGFLVGNASLKESFVDIIKSAM
7NCO , Knot 111 253 0.81 38 172 241
MSGSHHHHHHSGSMSERPSDLVVNRLVLFVVKGTATSTHNTVKPLILLEELGVPHDIYVVEKVSAPWFSEINPHKMVPAILDRSPDGRDTLRAWESTSTLMYIADAYDKDGTFGGRNVQERSEINNWLTLHTAALGPTARYWLYFYKLHPEKLPKTIEKLRSNITVQYDILERRLNEPGQQYLALKDRPTIADIATLPFAMKSTAELFGLEFEKWPKLQEWSVRMGEREAVKRAWQRVAGFGHGEKEYGMLEA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2JQX_1)}(2) \setminus P_{f(5BMX_1)}(2)|=155\), \(|P_{f(5BMX_1)}(2) \setminus P_{f(2JQX_1)}(2)|=29\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110010000101010100110001111011011111001001100111000011100001011100100001111000110001100110111010010100011000100011101111110100110110101101001101001110011110100100100111110011000111001000011110110001010100100001001101110010111100111000110101010101011000110100111011100110000011110100011100011111010100010001001100100000001101001010100111100110110111110001001101110111011111001010000001010110101011001111001100100111111001011110000000101000110100011110011100010010011011111000010001110100000110111011010101101111110111010000100101110011110101101011000000100101011000101010111001101111001010100100010001011110110110011100011010011110001010100001101100111000010101001101100001101100111101100011011001111110010100011101101000000
Pair \(Z_2\) Length of longest common subsequence
2JQX_1,5BMX_1 184 4
2JQX_1,7NCO_1 178 4
5BMX_1,7NCO_1 184 4

Newick tree

 
[
	5BMX_1:92.97,
	[
		2JQX_1:89,7NCO_1:89
	]:3.97
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{971 }{\log_{20} 971}-\frac{248}{\log_{20}248})=195.\)
Status Protein1 Protein2 d d1/2
Query variables 2JQX_1 5BMX_1 246 163
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]