CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5VLQ_1 4CQG_1 9IPE_1 Letter Amino acid
14 9 50 C Cysteine
25 16 23 H Histidine
46 47 50 L Leucine
36 21 40 S Serine
13 5 6 W Tryptophan
26 8 45 N Asparagine
48 21 28 D Aspartic acid
34 16 34 I Isoleucine
13 12 10 M Methionine
19 18 16 Y Tyrosine
34 12 28 R Arginine
41 22 40 E Glutamic acid
40 26 36 K Lycine
25 12 32 P Proline
25 12 35 T Threonine
29 19 33 V Valine
43 17 28 A Alanine
23 14 24 Q Glutamine
31 16 51 G Glycine
24 8 18 F Phenylalanine

5VLQ_1|Chains A, B|LOC100158544 protein|Xenopus tropicalis (8364)
>4CQG_1|Chain A|Maternal embryonic leucine zipper kinase|Mus musculus (10090)
>9IPE_1|Chain A|Epidermal growth factor receptor|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5VLQ , Knot 237 589 0.85 40 289 548
MSYYHHHHHHDYDIPTTENLYFQGIAVNPDRLKHAKALVEKAIKQKKIFAIHGPYPVIRSCLRSRGWVEKKFPKSGKAKQKKEKASDEDMEDDDGDGSSNDDDDGENSDEEENGDPDGTCDLMSRLLRNEDPNFFWTTKRDAVDCRFLKKDQMLNHYAKAGSFTTKVGLCLNLRNLHWFDDADPDSFFPRCYRLGAEDEKQSFKEDFWHTAARSILKRVANRRDICSPAATGGAKASHREPGANNGAQLLAKRGSRKRAESVPVQIILTALEACERYLNSLEHNDIDMETEATPAMTDTQWEEFLHGYYQVIHDGATIEHSEYYVDQCSEVLHKLEAVNPQLDIEGGRNIWIVKPGAKSRGRGIICMDRLEEILKLVDCDPMIVKDGKWVVQKYIERPLLIFGTKFDVRQWFLVTDWNPLTIWFYKECYVRFSSQPFSLENLDTSIHLCNNSIQKHYENSQSRHPLVPTDNMWSSRQLQVHLHKLGAPHAWEAVIVPGMKAAIIHAMQSAQDIVEYRKSSFELYGADFMFGENFHPWLIQINASPTMAASTTVTSRLCAEVQEDTLRIVLDRKLDRNCDIGAFELIYKQ
4CQG , Knot 145 331 0.84 40 208 319
MKDYDELLKYYELYETIGTGGFAKVKLACHVLTGEMVAIKIMDKNALGSDLPRVKTEIDALKSLRHQHICQLYHVLETKNKIFMVLEYCPGGELFDYIISQDRLSEEETRVVFRQILSAVAYVHSQGYAHRDLKPENLLFDENHKLKLIDFGLCAKPKGNKDYHLQECCGSLAYAAPELIQGKSYLGSEADVWSMGILLYVLMCGFLPFDDDNVMALYKKIMRGKYEVPKWLSPSSILLLQQMLQVDPKKRISMRNLLNHPWVMQDYSCPVEWQSKTPLTHLDEDCVTELSVHHRSSRQTMEDLISSWQYDHLTATYLLLLAKKARLEHHH
9IPE , Knot 240 627 0.82 40 274 566
LEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEVVLGNLEITYVQRNYDLSFLKTIQEVAGYVLIALNTVERIPLENLQIIRGNMYYENSYALAVLSNYDANKTGLKELPMRNLQEILHGAVRFSNNPALCNVESIQWRDIVSSDFLSNMSMDFQNHLGSCQKCDPSCPNGSCWGAGEENCQKLTKIICAQQCSGRCRGKSPSDCCHNQCAAGCTGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVKKCPRNYVVTDHGSCVRACGADSYEMEEDGVRKCKKCEGPCRKVCNGIGIGEFKDSLSINATNIKHFKNCTSISGDLHILPVAFRGDSFTHTPPLDPQELDILKTVKEITGFLLIQAWPENRTDLHAFENLEIIRGRTKQHGQFSLAVVSLNITSLGLRSLKEISDGDVIISGNKNLCYANTINWKKLFGTSGQKTKIISNRGENSCKATGQVCHALCSPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPREFVENSECIQCHPECLPQAMNITCTGRGPDNCIQCAHYIDGPHCVKTCPAGVMGENNTLVWKYADAGHVCHLCHPNCTYGCTGPGLEGCPTNGPKIPSHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5VLQ_1)}(2) \setminus P_{f(4CQG_1)}(2)|=131\), \(|P_{f(4CQG_1)}(2) \setminus P_{f(5VLQ_1)}(2)|=50\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1000000000000110000101011110100100101110011000011110110111000100011100011001010000001000010000101000000010000000010101000110011000010111000001100011000011000101101000111010100101100101001110000111000000100011001100110011000010011101110100001110011011100100001001110111011010000100100001010001011100001001101000110011010000001000001100101101010101100111101110001011101001001101100011110010111000100111111001010011110010110111000001010001101001000101000010000000000011110001100001010100111101101111111011110110010011000000101011011110010111101010101110001000101010000101110001000001111011000
Pair \(Z_2\) Length of longest common subsequence
5VLQ_1,4CQG_1 181 4
5VLQ_1,9IPE_1 159 6
4CQG_1,9IPE_1 188 4

Newick tree

 
[
	4CQG_1:96.14,
	[
		5VLQ_1:79.5,9IPE_1:79.5
	]:16.64
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{920 }{\log_{20} 920}-\frac{331}{\log_{20}331})=158.\)
Status Protein1 Protein2 d d1/2
Query variables 5VLQ_1 4CQG_1 202 157
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]