CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5OMY_1 8PBA_1 8BLX_1 Letter Amino acid
27 41 32 D Aspartic acid
23 51 30 K Lycine
15 32 7 M Methionine
13 39 19 F Phenylalanine
5 13 12 W Tryptophan
26 46 21 R Arginine
24 52 24 P Proline
25 68 20 S Serine
2 13 9 C Cysteine
33 67 32 L Leucine
13 53 20 T Threonine
23 46 12 Y Tyrosine
22 70 22 E Glutamic acid
17 43 14 N Asparagine
12 36 13 Q Glutamine
22 60 36 G Glycine
16 33 12 H Histidine
21 55 22 I Isoleucine
30 61 25 V Valine
22 56 43 A Alanine

5OMY_1|Chain A|Casein kinase II subunit alpha|Homo sapiens (9606)
>8PBA_1|Chains A, B|Dipeptidyl Peptidase Four (IV) family|Caenorhabditis elegans (6239)
>8BLX_1|Chain A|Fructosyl Peptide Oxidase mutant (X02A)|Parastagonospora nodorum SN15 (321614)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5OMY , Knot 170 391 0.86 40 217 371
MSGPVPSRARVYTDVNTHRPREYWDYESHVVEWGNQDDYQLVRKLGRGKYSEVFEAINITNNEKVVVKILKPVKKKKIKREIKILENLRGGPNIITLADIVKDPVSRTPALVFEHVNNTDFKQLYQTLTDYDIRFYMYEILKALDYCHSMGIMHRDVKPHNVMIDHEHRKLRLIDWGLAEFYHPGQEYNVRVASRYFKGPELLVDYQMYDYSLDMWSLGCMLASMIFRKEPFFHGHDNYDQLVRIAKVLGTEDLYDYIDKYNIELDPRFNDILGRHSRKRWERFVHSENQHLVSPEALDFLDKLLRYDHQSRLTAREAMEHPYFYTVVKDQARMGSSSMPGGSTPVSSANMMSGISSVPTPSPLGPLAGSPVIAAANPLGMPVPAAAGAQQ
8PBA , Knot 353 935 0.86 40 332 867
MMFNFYQFLYNLQNVSPFIDFSVLKQLTHTKMRENEPARFETRSFSQLIDHARSWKTEVRGMTTQGFTKISLMRAEKDRLNMYAISSVPGTNTQSIFSVTIPLELVEKAQVADRKFELKLKSGYNVDSYIRKTPPSAEFTLQCERQRSQVVTGISDYEIRNGKMILMAGDQLFRYNPLNEALAAIPIAVPDDQSSTEPMDISEGSITSGTKGSGSEAPQSSTVPPVTRIPIKKPTTSTEKPATAPPTNNFVSSAKVCPADSSLLAYVLNKQVYIEKNGKIIHRTSSNSKHITNGVPSYIVQEELERFEGIWWSESKTRLLYEHVNEEKVAESQFGVNGDPPVAPMKYPRAGTKNAYSTLRMVILENGKAYDVPLKDEVIYKHCPFYEYITRAGFFSDGTTVWVQVMSRDQAQCSLLLIPYTDFLLPEELGGSIKEDNLQLSTDLNMGVWDDKSHEETMEKPPRGKLRGTVQIHKARNDYWINTHNAIYPLKITDEEHPMYEFIYCLEKPNGSCLALISAELDQNGYCRHTEEKLLMAENFSINKSMGIVVDEVRELVYYVANESHPTEWNICVSHYRTGQHAQLTESGICFKSERANGKLALDLDHGFACYMTSVGSPAECRFYSFRWKENEVLPSTVYAANITVSGHPGQPDLHFDSPEMIEFQSKKTGLMHYAMILRPSNFDPYKKYPVFHYVYGGPGIQIVHNDFSWIQYIRFCRLGYVVVFIDNRGSAHRGIEFERHIHKKMGTVEVEDQVEGLQMLAERTGGFMDMSRVVVHGWSYGGYMALQMIAKHPNIYRAAIAGGAVSDWRLYDTAYTERYMGYPLEEHVYGASSITGLVEKLPDEPNRLMLVHGLMDENVHFAHLTHLVDECIKKGKWHELVIFPNERHGVRNNDASIYLDARMMYFAQQAIQGFGPTTAAPRQGPLWSHPQFEK
8BLX , Knot 183 425 0.86 40 241 403
APSRANTKVIVVGGGGTIGSSTALHLVRSGYTPSNVTVLDAYPIPSCQSAGNDLNKIMDADADPAADAARQMWNEDELFKKFFHNTGRLDCAHGEKDIADLKKRYQNLVDWGLDATVEWLDSEDEILKRMPQLTRDQIKGWKAIFSKDGGWLAAAKAIKAIGEYLRDQGVRFGFYGAGSFKQPLLAEGVCIGVETVDGTRYYADKVVLAAGAWSPTLVELQEQCVSKAWVYGHIQLTPEEAARYKNSPVVYNGDVGFFFEPNEHGIIKVCDEFPGFTRFKMHQPFGAKAPKRISVPRSHCKHPTDTIPDASIVRIRRAIATFMPQFKNKPLFNQAMCWCTDTADGHLLICEHPEWKNFYLATGDSGDSFKLLPIIGKYVVELLEGTLADELAHKWRWRPGSGDALKSRREAPAKDLADMPGWNHD

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5OMY_1)}(2) \setminus P_{f(8PBA_1)}(2)|=21\), \(|P_{f(8PBA_1)}(2) \setminus P_{f(5OMY_1)}(2)|=136\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1011110010100010000100010000011011000000110011010000110110100000111011011000010001011001011101101101100110001111100100001001000100001010100110110000011110001010011100000010110111101001100001011000101101110001000010110110111011100011101000000110110111000100010000101010100111000000100110000001101011011001100000001010011001010011000101100011110011001011011001101011111110111111011111111111100
Pair \(Z_2\) Length of longest common subsequence
5OMY_1,8PBA_1 157 5
5OMY_1,8BLX_1 174 4
8PBA_1,8BLX_1 147 4

Newick tree

 
[
	5OMY_1:85.75,
	[
		8PBA_1:73.5,8BLX_1:73.5
	]:12.25
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1326 }{\log_{20} 1326}-\frac{391}{\log_{20}391})=242.\)
Status Protein1 Protein2 d d1/2
Query variables 5OMY_1 8PBA_1 315 219.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]