CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6DNH_1 6HKI_1 4EIP_1 Letter Amino acid
83 43 39 T Threonine
48 33 6 Y Tyrosine
45 28 6 N Asparagine
39 12 30 H Histidine
61 45 17 I Isoleucine
90 34 55 R Arginine
84 56 34 P Proline
105 54 55 G Glycine
155 79 56 L Leucine
62 60 7 K Lycine
34 31 8 M Methionine
59 40 18 F Phenylalanine
103 67 62 A Alanine
67 51 33 D Aspartic acid
54 37 4 Q Glutamine
14 13 10 W Tryptophan
113 58 40 V Valine
22 19 4 C Cysteine
108 86 32 E Glutamic acid
97 70 33 S Serine

6DNH_1|Chain A|Cleavage and polyadenylation specificity factor subunit 1|Homo sapiens (9606)
>6HKI_1|Chains A, B|Protein O-GlcNAcase|Homo sapiens (9606)
>4EIP_1|Chains A, B|Putative FAD-monooxygenase|Lechevalieria aerocolonigenes (68170)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6DNH , Knot 511 1443 0.85 40 346 1235
MYAVYKQAHPPTGLEFSMYCNFFNNSERNLVVAGTSQLYVYRLNRDAEALTKNDRSTEGKAHREKLELAASFSFFGNVMSMASVQLAGAKRDALLLSFKDAKLSVVEYDPGTHDLKTLSLHYFEEPELRDGFVQNVHTPRVRVDPDGRCAAMLVYGTRLVVLPFRRESLAEEHEGLVGEGQRSSFLPSYIIDVRALDEKLLNIIDLQFLHGYYEPTLLILFEPNQTWPGRVAVRQDTCSIVAISLNITQKVHPVIWSLTSLPFDCTQALAVPKPIGGVVVFAVNSLLYLNQSVPPYGVALNSLTTGTTAFPLRTQEGVRITLDCAQATFISYDKMVISLKGGEIYVLTLITDGMRSVRAFHFDKAAASVLTTSMVTMEPGYLFLGSRLGNSLLLKYTEKLQEPPASAVREAADKEEPPSKKKRVDATAGWSAAGKSVPQDEVDEIEVYGSEAQSGTQLATYSFEVCDSILNIGPCANAAVGEPAFLSEEFQNSPEPDLEIVVCSGHGKNGALSVLQKSIRPQVVTTFELPGCYDMWTVIAPVRKEEEDNPKGEGTEQEPSTTPEADDDGRRHGFLILSREDSTMILQTGQEIMELDTSGFATQGPTVFAGNIGDNRYIVQVSPLGIRLLEGVNQLHFIPVDLGAPIVQCAVADPYVVIMSAEGHVTMFLLKSDSYGGRHHRLALHKPPLHHQSKVITLCLYRDLSGMFTTESRLGGARDELGGRSGPEAEGLGSETSPTVDDEEEMLYGDSGSLFSPSKEEARRSSQPPADRDPAPFRAEPTHWCLLVRENGTMEIYQLPDWRLVFLVKNFPVGQRVLVDSSFGQPTTQGEARREEATRQGELPLVKEVLLVALGSRQSRPYLLVHVDQELLIYEAFPHDSQLGQGNLKVRFKKVPHNINFREKKPKPSKKKAEGGGAEEGAGARGRVARFRYFEDIYGYSGVFICGPSPHWLLVTGRGALRLHPMAIDGPVDSFAPFHNVNCPRGFLYFNRQGELRISVLPAYLSYDAPWPVRKIPLRCTAHYVAYHVESKVYAVATSTNTPCARIPRMTGEEKEFETIERDERYIHPQQEAFSIQLISPVSWEAIPNARIELQEWEHVTCMKTVSLRSEETVSGLKGYVAAGTCLMQGEEVTCRGRILIMDVIEVVPEPGQPLTKNKFKVLYEKEQKGPVTALCHCNGHLVSAIGQKIFLWSLRASELTGMAFIDTQLYIHQMISVKNFILAADVMKSISLLRYQEESKTLSLVSRDAKPLEVYSVDFMVDNAQLGFLVSDRDRNLMVYMYLPEAKESFGGMRLLRRADFHVGAHVNTFWRTPCRGATEGLSKKSVVWENKHITWFATLDGGIGLLLPMQEKTYRRLLMLQNALTTMLPHHAGLNPRAFRMLHVDRRTLQNAVRNVLDGELLNRYLYLSTMERSELAKKIGTTPDIILDDLLETDRVTAHF
6HKI , Knot 343 916 0.85 40 313 840
MVQKESQATLEERESELSSNPAASAGASLEPPAAPAPGEDNPAGAGGAAVAGAAGGARRFLCGVVEGFYGRPWVMEQRKELFRRLQKWELNTYLYAPKDDYKHRMFWREMYSVEEAEQLMTLISAAREYEIEFIYAISPGLDITFSNPKEVSTLKRKLDQVSQFGCRSFALLFDDIDHNMCAADKEVFSSFAHAQVSITNEIYQYLGEPETFLFCPTEYCGTFCYPNVSQSPYLRTVGEKLLPGIEVLWTGPKVVSKEIPVESIEEVSKIIKRAPVIWDNIHANDYDQKRLFLGPYKGRSTELIPRLKGVLTNPNCEFEANYVAIHTLATWYKSNMNGVRKDVVMTDSEDSTVSIQIKLENEGSDEDIETDVLYSPQMALKLALTEWLQEFGVPHQYSSRQVAHSGAKASVVDGTPLVAAPSLNATTVVTTVYQEPIMSQGAALSGEPTTLTKEEEKKQPDEEPMDMVVEKQEETDHKNDNQILSEIVEAKMAEELKPMDTDKESIAESKSPEMSMQEDCISDIAPMQTDEQTNKEQFVPGPNEKPLYTAEPVTLEDLQLLADLFYLPYEHGPKGAQMLREFQWLRANSSVVSVNCKGKDSAKIAEWRSRAAKFEEMCGLVMGMFTRLSNCANRTILYDMYSYVWDIKSIMSMVKSFVQWLGCRSHSSAQFLIGDQEPWAFRGGLAGEFQRLLPIDGANDLFFQPPPLTPTSKVYTIRPYFPKDEASVYKICREMYDDGVGLPFQSQPDLIGDKLVGGLLSLSLDYCFVLEDEDGICGYALGTVDVTPFIKKCKISWIPFMQEKYTKPNGDKELSEAEKIMLSFHEEQEVLPETFLANFPSLIKMDIHKKVTDPSVAKSMMACLLSSLKANGSRGAFCEVRPDDKRILEFYSKLGCFEIAKMEGFPKDVVILGRSL
4EIP , Knot 212 549 0.81 40 214 507
MGSSHHHHHHSSGLVPRGSHMNAPIETDVLILGGGPVGMALALDLAHRQVGHLVVDAGDGTITHPKVSTIGPRSMELFRRWGVAKQIRTAGWPGDHPLDAAWVTRVGGHEVYRIPLGTADTRATPEHTPEPDAICPAHWLAPLLAEAVGERLRTRSRLDSFEQRDDHVRATITDLRTGATRAVHARYLVACDGASSPTRKALGIDAPPRHRTQVFRNILFRAPELRSLLGERAALVFFLMLSSSLRFPLRSLDGRGLYNLVVGVDDASKSTMDSFELVRRAVAFDTEIEVLSDSEWHLTHRVADSFSAGRVFLTGDAAHTLSPSGGFGMNTGIGSAADLGWKLAATLRGWAGPGLLATYEEERRPVAITSLEEANVNLRRTMDRELPPGLHDDGPRGERIRAAVAEKLERSGARREFDAPGIHFGHTYRSSIVCGEPETEVATGGWRPSARPGARAPHAWLTPTTSTLDLFGRGFVLLSFGTTDGVEAVTRAFADRHVPLETVTCHAPEIHALYERAHVLVRPDGHVAWRGDHLPAELGGLVDKVRGAA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6DNH_1)}(2) \setminus P_{f(6HKI_1)}(2)|=64\), \(|P_{f(6HKI_1)}(2) \setminus P_{f(6DNH_1)}(2)|=31\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101100010110110101000110000001111100010100100010110000000010100001011101011101101101011110001111010010101100011000100101001001010011100100101010101001111101001111110000110000111101000011100110101100011011010110100010111110100011101110000001111010100010111101001110000111110111111111100110100011101111001001001111000011010100101011000011101011010110110011001011010011101100011010110111100110011100000100111011001100001100000101011101110011000100101010010010011000101000110111010111101111000100010101011100101001110110001010110010111000110111110000000101010000100010100010001111100000011100100110100011100110111101100001101011110110110010111101111110011101011110101010111100000110000111001110000011010100010111000001111000111001101011100001010000011010010110100001000001110001111010100101110001010100110101111100111100111000110100010100001000101111001111111000001011101000111001110000110101010100110010100001010000101111001111010110100100101001111011010111101011101011110111001111001001011101000101010111101000111110011100010011001000101110000010101101010000100100000010100011010110110101110101010010010010010100000101101011110011010010001011110110111011011000010110000001110110000101101110011110101001011111000101001101001111101100101100000000101100010110100101110010111110000001110101101000111101100101011101001100100110011000011100001011101011111111100000001111001100111001110101101101000010011001101011000101001000011001100101110011000010101
Pair \(Z_2\) Length of longest common subsequence
6DNH_1,6HKI_1 95 5
6DNH_1,4EIP_1 164 4
6HKI_1,4EIP_1 169 6

Newick tree

 
[
	4EIP_1:92.14,
	[
		6DNH_1:47.5,6HKI_1:47.5
	]:44.64
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{2359 }{\log_{20} 2359}-\frac{916}{\log_{20}916})=345.\)
Status Protein1 Protein2 d d1/2
Query variables 6DNH_1 6HKI_1 445 362.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]