CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3WUE_1 6URG_1 2XYT_1 Letter Amino acid
17 83 17 T Threonine
7 14 4 W Tryptophan
14 90 11 R Arginine
24 105 10 G Glycine
20 155 16 L Leucine
5 34 5 M Methionine
14 84 12 P Proline
16 97 20 S Serine
36 103 10 A Alanine
16 61 9 I Isoleucine
13 62 8 K Lycine
8 39 2 H Histidine
26 113 19 V Valine
26 67 19 D Aspartic acid
2 22 4 C Cysteine
20 108 9 E Glutamic acid
13 48 11 Y Tyrosine
9 45 9 N Asparagine
16 54 11 Q Glutamine
11 59 11 F Phenylalanine

3WUE_1|Chain A|Endo-1,4-beta-xylanase A|Streptomyces sp. (1931)
>6URG_1|Chain A|Cleavage and polyadenylation specificity factor subunit 1|Homo sapiens (9606)
>2XYT_1|Chains A, B, C, D, E, F, G, H, I, J|SOLUBLE ACETYLCHOLINE RECEPTOR|APLYSIA CALIFORNICA (6500)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3WUE , Knot 132 313 0.80 40 192 294
DTATLGELAEAKGRYFGSATDNPELPDTQYTQILGSEFSQITVGNTMKWQYTEPSRGRFDYTAAEEIVDLAESNGQSVRGHTLVWHNQLPSWVDDVPAGELLGVMRDHITHEVDHFKGRLIHWDVVNEAFEEDGSRRQSVFQQKIGDSYIAEAFKAARAADPDVKLYYNDYNIEGIGPKSDAVYEMVKSFKAQGIPIDGVGMQAHLIAGQVPASLQENIRRFADLGVDVALTELDIRMTLPRTAAKDAQQATDYGAVVEACLVVSRCVGITVWDYTDKYSWVPSVFPGQGAALPWDEDFAKKPAYHAIAAALN
6URG , Knot 511 1443 0.85 40 346 1235
MYAVYKQAHPPTGLEFSMYCNFFNNSERNLVVAGTSQLYVYRLNRDAEALTKNDRSTEGKAHREKLELAASFSFFGNVMSMASVQLAGAKRDALLLSFKDAKLSVVEYDPGTHDLKTLSLHYFEEPELRDGFVQNVHTPRVRVDPDGRCAAMLVYGTRLVVLPFRRESLAEEHEGLVGEGQRSSFLPSYIIDVRALDEKLLNIIDLQFLHGYYEPTLLILFEPNQTWPGRVAVRQDTCSIVAISLNITQKVHPVIWSLTSLPFDCTQALAVPKPIGGVVVFAVNSLLYLNQSVPPYGVALNSLTTGTTAFPLRTQEGVRITLDCAQATFISYDKMVISLKGGEIYVLTLITDGMRSVRAFHFDKAAASVLTTSMVTMEPGYLFLGSRLGNSLLLKYTEKLQEPPASAVREAADKEEPPSKKKRVDATAGWSAAGKSVPQDEVDEIEVYGSEAQSGTQLATYSFEVCDSILNIGPCANAAVGEPAFLSEEFQNSPEPDLEIVVCSGHGKNGALSVLQKSIRPQVVTTFELPGCYDMWTVIAPVRKEEEDNPKGEGTEQEPSTTPEADDDGRRHGFLILSREDSTMILQTGQEIMELDTSGFATQGPTVFAGNIGDNRYIVQVSPLGIRLLEGVNQLHFIPVDLGAPIVQCAVADPYVVIMSAEGHVTMFLLKSDSYGGRHHRLALHKPPLHHQSKVITLCLYRDLSGMFTTESRLGGARDELGGRSGPEAEGLGSETSPTVDDEEEMLYGDSGSLFSPSKEEARRSSQPPADRDPAPFRAEPTHWCLLVRENGTMEIYQLPDWRLVFLVKNFPVGQRVLVDSSFGQPTTQGEARREEATRQGELPLVKEVLLVALGSRQSRPYLLVHVDQELLIYEAFPHDSQLGQGNLKVRFKKVPHNINFREKKPKPSKKKAEGGGAEEGAGARGRVARFRYFEDIYGYSGVFICGPSPHWLLVTGRGALRLHPMAIDGPVDSFAPFHNVNCPRGFLYFNRQGELRISVLPAYLSYDAPWPVRKIPLRCTAHYVAYHVESKVYAVATSTNTPCARIPRMTGEEKEFETIERDERYIHPQQEAFSIQLISPVSWEAIPNARIELQEWEHVTCMKTVSLRSEETVSGLKGYVAAGTCLMQGEEVTCRGRILIMDVIEVVPEPGQPLTKNKFKVLYEKEQKGPVTALCHCNGHLVSAIGQKIFLWSLRASELTGMAFIDTQLYIHQMISVKNFILAADVMKSISLLRYQEESKTLSLVSRDAKPLEVYSVDFMVDNAQLGFLVSDRDRNLMVYMYLPEAKESFGGMRLLRRADFHVGAHVNTFWRTPCRGATEGLSKKSVVWENKHITWFATLDGGIGLLLPMQEKTYRRLLMLQNALTTMLPHHAGLNPRAFRMLHVDRRTLQNAVRNVLDGELLNRYLYLSTMERSELAKKIGTTPDIILDDLLETDRVTAHF
2XYT , Knot 100 217 0.82 40 157 211
QANLMRLKSDLFNRSPMYPGPTKDDPLTVTLGFTLQDIVKVDSSTNEVDLVYYEQQRWKLNSLMWDPNEYGNITDFRTSAADIWTPDITAYSSTRPVQVLSPQIAVVTHDGSVMFIPAQRLSFMCDPTGVDSEEGVTCAVKFGSWVYSGFEIDLKTDTDQVDLSSYYASSKYEILSATQTRQVQHYSCCPEPYIDVNLVVKFRERRAGNGFFRNLFD

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3WUE_1)}(2) \setminus P_{f(6URG_1)}(2)|=14\), \(|P_{f(6URG_1)}(2) \setminus P_{f(3WUE_1)}(2)|=168\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0010110110101001101000101100000011100100101100101000010010100011001101100010010100111000110110011110111110001000100101011010110011000100000110001100011011011011010101000000101111000110011001010111101111010111101110100010011011101110010101011001100100100011110101110001110110000000111011110111111000110011001111110
Pair \(Z_2\) Length of longest common subsequence
3WUE_1,6URG_1 182 4
3WUE_1,2XYT_1 173 3
6URG_1,2XYT_1 211 4

Newick tree

 
[
	6URG_1:10.20,
	[
		3WUE_1:86.5,2XYT_1:86.5
	]:15.70
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1756 }{\log_{20} 1756}-\frac{313}{\log_{20}313})=367.\)
Status Protein1 Protein2 d d1/2
Query variables 3WUE_1 6URG_1 470 282.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]