CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1QTV_1 1PMM_1 9BML_1 Letter Amino acid
10 36 256 D Aspartic acid
10 27 247 I Isoleucine
13 25 293 K Lycine
9 25 351 V Valine
12 16 189 N Asparagine
5 15 122 M Methionine
13 26 289 R Arginine
0 10 48 C Cysteine
5 20 257 Q Glutamine
1 9 92 H Histidine
5 23 178 F Phenylalanine
3 25 186 P Proline
6 19 273 S Serine
6 16 128 Y Tyrosine
16 40 307 A Alanine
11 36 214 G Glycine
16 40 518 L Leucine
11 19 238 T Threonine
3 11 77 W Tryptophan
9 28 383 E Glutamic acid

1QTV_1|Chain A|PROTEIN (T4 LYSOZYME)|Enterobacteria phage T4 (10665)
>1PMM_1|Chains A, B, C, D, E, F|Glutamate decarboxylase beta|Escherichia coli (562)
>9BML_1|Chain A|Cytoplasmic dynein 1 heavy chain 1|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1QTV , Knot 77 164 0.79 38 118 157
MNIFEMLRIDEGLRLKIYKDTEGYYEIGIGHLLTKSPSLNAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
1PMM , Knot 196 466 0.86 40 252 441
MDKKQVTDLRSELLDSRFGAKSISTIAESKRFPLHEMRDDVAFQIINDELYLDGNARQNLATFCQTWDDENVHKLMDLSINKNWIDKEEYPQSAAIDLRCVNMVADLWHAPAPKNGQAVGTNTIGSSEACMLGGMAMKWRWRKRMEAAGKPTDKPNLVCGPVQICWHKFARYWDVELREIPMRPGQLFMDPKRMIEACDENTIGVVPTFGVTYTGNYEFPQPLHDALDKFQADTGIDIDMHIDAASGGFLAPFVAPDIVWDFRLPRVKSISASGHKFGLAPLGCGWVIWRDEEALPQELVFNVDYLGGQIGTFAINFSRPAGQVIAQYYEFLRLGREGYTKVQNASYQVAAYLADEIAKLGPYEFICTGRPDEGIPAVCFKLKDGEDPGYTLYDLSERLRLRGWQVPAFTLGGEATDIVVMRIMCRRGFEMDFAELLLEDYKASLKYLSDHPKLQGIAQQNSFKHT
9BML , Knot 1424 4646 0.86 40 389 3017
MSEPGGGGGEDGSAGLEVSAVQNVADVSVLQKHLRKLVPLLLEDGGEAPAALEAALEEKSALEQMRKFLSDPQVHTVLVERSTLKEDVGDEGEEEKEFISYNINIDIHYGVKSNSLAFIKRTPVIDADKPVSSQLRVLTLSEDSPYETLHSFISNAVAPFFKSYIRESGKADRDGDKMAPSVEKKIAELEMGLLHLQQNIEIPEISLPIHPMITNVAKQCYERGEKPKVTDFGDKVEDPTFLNQLQSGVNRWIREIQKVTKLDRDPASGTALQEISFWLNLERALYRIQEKRESPEVLLTLDILKHGKRFHATVSFDTDTGLKQALETVNDYNPLMKDFPLNDLLSATELDKIRQALVAIFTHLRKIRNTKYPIQRALRLVEAISRDLSSQLLKVLGTRKLMHVAYEEFEKVMVACFEVFQTWDDEYEKLQVLLRDIVKRKREENLKMVWRINPAHRKLQARLDQMRKFRRQHEQLRAVIVRVLRPQVTAVAQQNQGEVPEPQDMKVAEVLFDAADANAIEEVNLAYENVKEVDGLDVSKEGTEAWEAAMKRYDERIDRVETRITARLRDQLGTAKNANEMFRIFSRFNALFVRPHIRGAIREYQTQLIQRVKDDIESLHDKFKVQYPQSQACKMSHVRDLPPVSGSIIWAKQIDRQLTAYMKRVEDVLGKGWENHVEGQKLKQDGDSFRMKLNTQEIFDDWARKVQQRNLGVSGRIFTIESTRVRGRTGNVLKLKVNFLPEIITLSKEVRNLKWLGFRVPLAIVNKAHQANQLYPFAISLIESVRTYERTCEKVEERNTISLLVAGLKKEVQALIAEGIALVWESYKLDPYVQRLAETVFNFQEKVDDLLIIEEKIDLEVRSLETCMYDHKTFSEILNRVQKAVDDLNLHSYSNLPIWVNKLDMEIERILGVRLQAGLRAWTQVLLGQAEDKAEVDMDTDAPQVSHKPGGEPKIKNVVHELRITNQVIYLNPPIEECRYKLYQEMFAWKMVVLSLPRIQSQRYQVGVHYELTEEEKFYRNALTRMPDGPVALEESYSAVMGIVSEVEQYVKVWLQYQCLWDMQAENIYNRLGEDLNKWQALLVQIRKARGTFDNAETKKEFGPVVIDYGKVQSKVNLKYDSWHKEVLSKFGQMLGSNMTEFHSQISKSRQELEQHSVDTASTSDAVTFITYVQSLKRKIKQFEKQVELYRNGQRLLEKQRFQFPPSWLYIDNIEGEWGAFNDIMRRKDSAIQQQVANLQMKIVQEDRAVESRTTDLLTDWEKTKPVTGNLRPEEALQALTIYEGKFGRLKDDREKCAKAKEALELTDTGLLSGSEERVQVALEELQDLKGVWSELSKVWEQIDQMKEQPWVSVQPRKLRQNLDALLNQLKSFPARLRQYASYEFVQRLLKGYMKINMLVIELKSEALKDRHWKQLMKRLHVNWVVSELTLGQIWDVDLQKNEAIVKDVLLVAQGEMALEEFLKQIREVWNTYELDLVNYQNKCRLIRGWDDLFNKVKEHINSVSAMKLSPYYKVFEEDALSWEDKLNRIMALFDVWIDVQRRWVYLEGIFTGSADIKHLLPVETQRFQSISTEFLALMKKVSKSPLVMDVLNIQGVQRSLERLADLLGKIQKALGEYLERERSSFPRFYFVGDEDLLEIIGNSKNVAKLQKHFKKMFAGVSSIILNEDNSVVLGISSREGEEVMFKTPVSITEHPKINEWLTLVEKEMRVTLAKLLAESVTEVEIFGKATSIDPNTYITWIDKYQAQLVVLSAQIAWSENVETALSSMGGGGDAAPLHSVLSNVEVTLNVLADSVLMEQPPLRRRKLEHLITELVHQRDVTRSLIKSKIDNAKSFEWLSQMRFYFDPKQTDVLQQLSIQMANAKFNYGFEYLGVQDKLVQTPLTDRCYLTMTQALEARLGGSPFGPAGTGKTESVKALGHQLGRFVLVFNCDETFDFQAMGRIFVGLCQVGAWGCFDEFNRLEERMLSAVSQQVQCIQEALREHSNPNYDKTSAPITCELLNKQVKVSPDMAIFITMNPGYAGRSNLPDNLKKLFRSLAMTKPDRQLIAQVMLYSQGFRTAEVLANKIVPFFKLCDEQLSSQSHYDFGLRALKSVLVSAGNVKRERIQKIKREKEERGEAVDEGEIAENLPEQEILIQSVCETMVPKLVAEDIPLLFSLLSDVFPGVQYHRGEMTALREELKKVCQEMYLTYGDGEEVGGMWVEKVLQLYQITQINHGLMMVGPSGSGKSMAWRVLLKALERLEGVEGVAHIIDPKAISKDHLYGTLDPNTREWTDGLFTHVLRKIIDSVRGELQKRQWIVFDGDVDPEWVENLNSVLDDNKLLTLPNGERLSLPPNVRIMFEVQDLKYATLATVSRCGMVWFSEDVLSTDMIFNNFLARLRSIPLDEGEDEAQRRRKGKEDEGEEAASPMLQIQRDAATIMQPYFTSNGLVTKALEHAFQLEHIMDLTRLRCLGSLFSMLHQACRNVAQYNANHPDFPMQIEQLERYIQRYLVYAILWSLSGDSRLKMRAELGEYIRRITTVPLPTAPNIPIIDYEVSISGEWSPWQAKVPQIEVETHKVAAPDVVVPTLDTVRHEALLYTWLAEHKPLVLCGPPGSGKTMTLFSALRALPDMEVVGLNFSSATTPELLLKTFDHYCEYRRTPNGVVLAPVQLGKWLVLFCDEINLPDMDKYGTQRVISFIRQMVEHGGFYRTSDQTWVKLERIQFVGACNPPTDPGRKPLSHRFLRHVPVVYVDYPGPASLTQIYGTFNRAMLRLIPSLRTYAEPLTAAMVEFYTMSQERFTQDTQPHYIYSPREMTRWVRGIFEALRPLETLPVEGLIRIWAHEALRLFQDRLVEDEERRWTDENIDTVALKHFPNIDREKAMSRPILYSNWLSKDYIPVDQEELRDYVKARLKVFYEEELDVPLVLFNEVLDHVLRIDRIFRQPQGHLLLIGVSGAGKTTLSRFVAWMNGLSVYQIKVHRKYTGEDFDEDLRTVLRRSGCKNEKIAFIMDESNVLDSGFLERMNTLLANGEVPGLFEGDEYATLMTQCKEGAQKEGLMLDSHEELYKWFTSQVIRNLHVVFTMNPSSEGLKDRAATSPALFNRCVLNWFGDWSTEALYQVGKEFTSKMDLEKPNYIVPDYMPVVYDKLPQPPSHREAIVNSCVFVHQTLHQANARLAKRGGRTMAITPRHYLDFINHYANLFHEKRSELEEQQMHLNVGLRKIKETVDQVEELRRDLRIKSQELEVKNAAANDKLKKMVKDQQEAEKKKVMSQEIQEQLHKQQEVIADKQMSVKEDLDKVEPAVIEAQNAVKSIKKQHLVEVRSMANPPAAVKLALESICLLLGESTTDWKQIRSIIMRENFIPTIVNFSAEEISDAIREKMKKNYMSNPSYNYEIVNRASLACGPMVKWAIAQLNYADMLKRVEPLRNELQKLEDDAKDNQQKANEVEQMIRDLEASIARYKEEYAVLISEAQAIKADLAAVEAKVNRSTALLKSLSAERERWEKTSETFKNQMSTIAGDCLLSAAFIAYAGYFDQQMRQNLFTTWSHHLQQANIQFRTDIARTEYLSNADERLRWQASSLPADDLCTENAIMLKRFNRYPLIIDPSGQATEFIMNEYKDRKITRTSFLDDAFRKNLESALRFGNPLLVQDVESYDPVLNPVLNREVRRTGGRVLITLGDQDIDLSPSFVIFLSTRDPTVEFPPDLCSRVTFVNFTVTRSSLQSQCLNEVLKAERPDVDEKRSDLLKLQGEFQLRLRQLEKSLLQALNEVKGRILDDDTIITTLENLKREAAEVTRKVEETDIVMQEVETVSQQYLPLSTACSSIYFTMESLKQIHFLYQYSLQFFLDIYHNVLYENPNLKGVTDHTQRLSIITKDLFQVAFNRVARGMLHQDHITFAMLLARIKLKGTVGEPTYDAEFQHFLRGNEIVLSAGSTPRIQGLTVEQAEAVVRLSCLPAFKDLIAKVQADEQFGIWLDSSSPEQTVPYLWSEETPATPIGQAIHRLLLIQAFRPDRLLAMAHMFVSTNLGESFMSIMEQPLDLTHIVGTEVKPNTPVLMCSVPGYDASGHVEDLAAEQNTQITSIAIGSAEGFNQADKAINTAVKSGRWVMLKNVHLAPGWLMQLEKKLHSLQPHACFRLFLTMEINPKVPVNLLRAGRIFVFEPPPGVKANMLRTFSSIPVSRICKSPNERARLYFLLAWFHAIIQERLRYAPLGWSKKYEFGESDLRSACDTVDTWLDDTAKGRQNISPDKIPWSALKTLMAQSIYGGRVDNEFDQRLLNTFLERLFTTRSFDSEFKLACKVDGHKDIQMPDGIRREEFVQWVELLPDTQTPSWLGLPNNAERVLLTTQGVDMISKMLKMQMLEDEDDLAYAETEKKTRTDSTSDGRPAWMRTLHTTASNWLHLIPQTLSHLKRTVENIKDPLFRFFEREVKMGAKLLQDVRQDLADVVQVCEGKKKQTNYLRTLINELVKGILPRSWSHYTVPAGMTVIQWVSDFSERIKQLQNISLAAASGGAKELKNIHVCLGGLFVPEAYITATRQYVAQANSWSLEELCLEVNVTTSQGATLDACSFGVTGLKLQGATCNNNKLSLSNAISTALPLTQLRWVKQTNTEKKASVVTLPVYLNFTRADLIFTVDFEIATKEDPRSFYERGVAVLCTE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1QTV_1)}(2) \setminus P_{f(1PMM_1)}(2)|=31\), \(|P_{f(1PMM_1)}(2) \setminus P_{f(1QTV_1)}(2)|=165\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10110110100110101000001000111101100010101100010011100001110000100110001011101110010101100010110011110111011001111100010110000100111011000100001001001100100101010001
Pair \(Z_2\) Length of longest common subsequence
1QTV_1,1PMM_1 196 3
1QTV_1,9BML_1 273 4
1PMM_1,9BML_1 141 6

Newick tree

 
[
	1QTV_1:13.02,
	[
		1PMM_1:70.5,9BML_1:70.5
	]:60.52
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{630 }{\log_{20} 630}-\frac{164}{\log_{20}164})=133.\)
Status Protein1 Protein2 d d1/2
Query variables 1QTV_1 1PMM_1 166 110
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]