CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8DGF_1 8CEO_1 6ZQN_1 Letter Amino acid
103 48 29 D Aspartic acid
32 14 5 H Histidine
126 57 39 I Isoleucine
69 49 32 R Arginine
95 27 13 N Asparagine
20 13 2 C Cysteine
63 33 49 G Glycine
165 81 48 L Leucine
137 50 31 K Lycine
107 54 33 S Serine
79 33 16 Y Tyrosine
62 38 49 A Alanine
45 32 17 P Proline
31 4 0 W Tryptophan
21 26 10 M Methionine
129 71 33 E Glutamic acid
95 33 14 F Phenylalanine
67 42 26 T Threonine
91 46 41 V Valine
50 27 23 Q Glutamine

8DGF_1|Chains A, B, C, D|ATP-binding protein Avs4|Escherichia coli (562)
>8CEO_1|Chain A[auth 0]|General transcription and DNA repair factor IIH helicase subunit XPD|Saccharomyces cerevisiae (4932)
>6ZQN_1|Chains A, B, C|ATP synthase subunit alpha, mitochondrial|Bos taurus (9913)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8DGF , Knot 549 1587 0.85 40 350 1316
MVKPNWDNFKAKFSENPQGNFEWFCYLLFCQEFKMPAGIFRYKNQSGIETNPITKDNEIIGWQSKFYDTKLSDNKADLIEMIEKSKKAYPGLSKIIFYTNQEWGQGRKSHEPEGDKNADNYLETVGNSNDPKIKIEVDQKAYESGIEIVWRVASFFESPFVIVENEKIAKHFFSLNESIFDLLEEKRKHTENVLYEIQTNIEFKDRSIEIDRRHCIELLHENLVQKKIVIVSGEGGVGKTAVIKKIYEAEKQYTPFYVFKASEFKKDSINELFGAHGLDDFSNAHQDELRKVIVVDSAEKLLELTNIDPFKEFLTVLIKDKWQVVFTTRNNYLADLNYAFIDIYKITPGNLVIKNLERGELIELSDNNGFSLPQDVRLLELIKNPFYLSEYLRFYTGESIDYVSFKEKLWNKIIVKNKPSREQCFLATAFQRASEGQFFVSPACDTGILDELVKDGIVGYEAAGYFITHDIYEEWALEKKISVDYIRKANNNEFFEKIGESLPVRRSFRNWISERLLLDDQSIKPFIAEIVCGEGISNFWKDELWVAVLLSDNSSIFFNYFKRYLLSSDQNLLKRLTFLLRLACKDVDYDLLKQLGVSNSDLLSIKYVLTKPKGTGWQSVIQFIYENLDEIGIRNINFILPVIQEWNQRNKVGETTRLSSLIALKYYQWTIDEDVYLSGRDNEKNILHTILHGAAMIKPEMEEVLVKVLKNRWKEHGTPYFDLMTLILTDLDSYPVWASLPEYVLQLADLFWYRPLKETGERYHSMDIEDEFGLFRSHHDYYPESPYQTPIYWLLQSQFKKTIDFILDFTNKTTICFAHSHFAKNEIEEVDVFIEEGKFIKQYICNRLWCSYRGTQVSTYLLSSIHMALEKFFLENFKNADSKVLESWLLFLLRNTKSASISAVVTSIVLAFPEKTFNVAKVLFQTKDFFRFDMNRMVLDRTHKSSLISLRDGFGGTDYRNSLHEEDRIKACDDVHRNTYLENLALHYQIFRSENVTEKDAIERQQVLWDIFDKYYNQLPDEAQETEADKTWRLCLARMDRRKMKITTKEKDEGIEISFNPEIDPKLKQYSEEAIKKNSEHMKYVTLKLWASYKREKDERYKNYGMYEDNPQIALQETKEIIKKLNEEGGEDFRLLNGNIPADVCSVLLLDYFNQLNNEEREYCKDIVLAYSKLPLKEGYNYQVQDGTTSAISALPVIYHNYPMERETIKTILLLTLFNDHSIGMAGGRYSVFPSMVIHKLWLDYFDDMQSLLFGFLILKPKYVILSRKIIHESYRQVDYDIKKININKVFLNNYKHCISNVIDNKISIDDLGSMDKVDLHILNTAFQLIPVDTVNIEHKKLVSLIVKRFSTSLLSSVREDRVDYALRQSFLERFAYFTLHAPVSDIPDYIKPFLDGFNGSEPISELFKKFILVEDRLNTYAKFWKVWDLFFDKVVTLCKDGDRYWYVDKIIKSYLFAESPWKENSNGWHTFKDSNSQFFCDVSRTMGHCPSTLYSLAKSLNNIASCYLNQGITWLSEILSVNKKLWEKKLENDTVYYLECLVRRYINNERERIRRTKQLKQEVLVILDFLVEKGSVVGYMSRENIL
8CEO , Knot 301 778 0.85 40 297 718
MKFYIDDLPVLFPYPKIYPEQYNYMCDIKKTLDVGGNSILEMPSGTGKTVSLLSLTIAYQMHYPEHRKIIYCSRTMSEIEKALVELENLMDYRTKELGYQEDFRGLGLTSRKNLCLHPEVSKERKGTVVDEKCRRMTNGQAKRKLEEDPEANVELCEYHENLYNIEVEDYLPKGVFSFEKLLKYCEEKTLCPYFIVRRMISLCNIIIYSYHYLLDPKIAERVSNEVSKDSIVIFDEAHNIDNVCIESLSLDLTTDALRRATRGANALDERISEVRKVDSQKLQDEYEKLVQGLHSADILTDQEEPFVETPVLPQDLLTEAIPGNIRRAEHFVSFLKRLIEYLKTRMKVLHVISETPKSFLQHLKQLTFIERKPLRFCSERLSLLVRTLEVTEVEDFTALKDIATFATLISTYEEGFLLIIEPYEIENAAVPNPIMRFTCLDASIAIKPVFERFSSVIITSGTISPLDMYPRMLNFKTVLQKSYAMTLAKKSFLPMIITKGSDQVAISSRFEIRNDPSIVRNYGSMLVEFAKITPDGMVVFFPSYLYMESIVSMWQTMGILDEVWKHKLILVETPDAQETSLALETYRKACSNGRGAILLSVARGKVSEGIDFDHQYGRTVLMIGIPFQYTESRILKARLEFMRENYRIRENDFLSFDAMRHAAQCLGRVLRGKDDYGVMVLADRRFSRKRSQLPKWIAQGLSDADLNLSTDMAISNTKQFLRTMAQPTDPKDQEGVSVWSYEDLIKHQNSRKDQGGFIENENKEGEQDEDEDEDIEMQ
6ZQN , Knot 204 510 0.83 38 229 476
EKTGTAEVSSILEERILGADTSVDLEETGRVLSIGDGIARVHGLRNVQAEEMVEFSSGLKGMSLNLEPDNVGVVVFGNDKLIKEGDIVKRTGAIVDVPVGEELLGRVVDALGNAIDGKGPIGSKARRRVGLKAPGIIPRISVREPMQTGIKAVDSLVPIGRGQRELIIGDRQTGKTSIAIDTIINQKRFNDGTDEKKKLYCIYVAIGQKRSTVAQLVKRLTDADAMKYTIVVSATASDAAPLQYLAPYSGCSMGEYFRDNGKHALIIYDDLSKQAVAYRQMSLLLRRPPGREAYPGDVFYLHSRLLERAAKMNDAFGGGSLTALPVIETQAGDVSAYIPTNVISITDGQIFLETELFYKGIRPAINVGLSVSRVGSAAQTRAMKQVAGTMKLELAQYREVAAFAQFGSDLDAATQQLLSRGVRLTELLKQGQYSPMAIEEQVAVIYAGVRGYLDKLEPSKITKFENAFLSHVISQHQALLGKIRTDGKISEESDAKLKEIVTNFLAGFEA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8DGF_1)}(2) \setminus P_{f(8CEO_1)}(2)|=76\), \(|P_{f(8CEO_1)}(2) \setminus P_{f(8DGF_1)}(2)|=23\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110101001010100010101011001110001011111100000011000110000011110001000010000101101100000101110011100000110100000101000100010011000010101010001000110111011011001111100001100110100011011000000000110010001010000101000001011000110001111010111100111001001000001101101001000010011110110010010000100111100100110100101100110111000101110000001101001110100101101110010010110100001101100101101100110100010100100100101000110011100010000011101100100101110110001110011001111001110110001000111000101001001000011001100111000100110001110000101111011010110011000111111100000111001000110000011001011101100010001100111000011010011001010110011011000100111001011111100100000110000100111100001010001010100000011001101111101010011101100010001010101101110010001111011001101101110011000100000101000111100000001001000110111000100010111010000010110001100010010111001011000100011000010010001100101110011100100100011001111110000010101110011111100010110111000011010100111000000011010011110000001000001010001000001001110001100001000011000011101100000011001000010001010110100001010000000110101010101010000001100000010010101110000000000000110000101110000011001000110010110101110100111100100100000000001111000111001000010010001101111100001100001001111011000011111100011101110011100100100111111110100111000110000001000100101001110000001001100010100110100101011001101111001010000110111001000110010000100110001100110101011100110010111011010011001100111100010001011011011100110100010001010011000111001100000110010000001100100011001001001100100110001001101100110100011000100001001001100010000001000001000111110111001011101000011
Pair \(Z_2\) Length of longest common subsequence
8DGF_1,8CEO_1 99 5
8DGF_1,6ZQN_1 141 5
8CEO_1,6ZQN_1 128 4

Newick tree

 
[
	6ZQN_1:72.30,
	[
		8DGF_1:49.5,8CEO_1:49.5
	]:22.80
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{2365 }{\log_{20} 2365}-\frac{778}{\log_{20}778})=382.\)
Status Protein1 Protein2 d d1/2
Query variables 8DGF_1 8CEO_1 481 357.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]