CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8XCA_1 5SLI_1 5UBV_1 Letter Amino acid
50 29 23 G Glycine
17 20 7 H Histidine
44 31 8 T Threonine
13 10 1 W Tryptophan
33 25 3 Y Tyrosine
27 27 11 N Asparagine
11 23 1 C Cysteine
20 16 6 Q Glutamine
87 15 17 E Glutamic acid
49 29 14 S Serine
56 43 20 V Valine
32 26 9 P Proline
102 22 15 R Arginine
58 34 12 D Aspartic acid
14 14 4 M Methionine
57 30 18 K Lycine
41 30 12 F Phenylalanine
73 34 30 A Alanine
31 25 14 I Isoleucine
93 40 21 L Leucine

8XCA_1|Chain A|CasJ19|unidentified (32644)
>5SLI_1|Chain A[auth D]|Proofreading exoribonuclease nsp14|Severe acute respiratory syndrome coronavirus 2 (2697049)
>5UBV_1|Chains A, B|ATPase domain of i-AAA protease|Myceliophthora thermophila (strain ATCC 42464 / BCRC 31852 / DSM 1799) (573729)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8XCA , Knot 336 908 0.84 40 295 801
MPSYKSSRVLVRDVPEELVDHYERSHRVAAFFMRLLLAMRREPYSLRMRDGTEREVDLDETDDFLRSAGCEEPDAVSDDLRSFALAVLHQDNPKKRAFLESENCVSILCLEKSASGTRYYKRPGYQLLKKAIEEEWGWDKFEASLLDERTGEVAEKFAALSMEDWRRFFAARDPDDLGRELLKTDTREGMAAALRLRERGVFPVSVPEHLDLDSLKAAMASAAERLKSWLACNQRAVDEKSELRKRFEEALDGVDPEKYALFEKFAAELQQADYNVTKKLVLAVSAKFPATEPSEFKRGVEILKEDGYKPLWEDFRELGFVYLAERKWERRRGGAAVTLCDADDSPIKVRFGLTGRGRKFVLSAAGSRFLITVKLPCGDVGLTAVPSRYFWNPSVGRTTSNSFRIEFTKRTTENRRYVGEVKEIGLVRQRGRYYFFIDYNFDPEEVSDETKVGRAFFRAPLNESRPKPKDKLTVMGIDLGINPAFAFAVCTLGECQDGIRSPVAKMEDVSFDSTGLRGGIGSQKLHREMHNLSDRCFYGARYIRLSKKLRDRGALNDIEARLLEEKYIPGFRIVHIEDADERRRTVGRTVKEIKQEYKRIRHQFYLRYHTSKRDRTELISAEYFRMLFLVKNLRNLLKSWNRYHWTTGDRERRGGNPDELKSYVRYYNNLRMDTLKKLTCAIVRTAKEHGATLVAMENIQRVDRDDEVKRRKENSLLSLWAPGMVLERVEQELKNEGILAWEVDPRHTSQTSCITDEFGYRSLVAKDTFYFEQDRKIHRIDADVNAAINIARRFLTRYRSLTQLWASLLDDGRYLVNVTRQHERAYLELQTGAPAATLNPTAEASYELVGLSPEEEELAQTRIKRKKREPFYRHEGVWLTREKHREQVHELRNQVLALGNAKIPEIRT
5SLI , Knot 217 523 0.86 40 267 499
SMLFKDCSKVITGLHPTQAPTHLSVDTKFKTEGLCVDIPGIPKDMTYRRLISMMGFKMNYQVNGYPNMFITREEAIRHVRAWIGFDVEGCHATREAVGTNLPLQLGFSTGVNLVAVPTGYVDTPNNTDFSRVSAKPPPGDQFKHLIPLMYKGLPWNVVRIKIVQMLSDTLKNLSDRVVFVLWAHGFELTSMKYFVKIGPERTCCLCDRRATCFSTASDTYACWHHSIGFDYVYNPFMIDVQQWGFTGNLQSNHDLYCQVHGNAHVASCDAIMTRCLAVHECFVKRVDWTIEYPIIGDELKINAACRKVQHMVVKAALLADKFPVLHDIGNPKAIKCVPQADVEWKFYDAQPCSDKAYKIEELFYSYATHSDKFTDGVCLFWNCNVDRYPANSIVCRFDTRVLSNLNLPGCDGGSLYVNKHAFHTPAFDKSAFVNLKQLPFFYYSDSPCESHGKQVVSDIDYVPLKSATCITRCNLGGAVCRHHANEYRLYLDAYNMMISAGFSLWVYKQFDTYNLWNTFTRLQ
5UBV , Knot 111 246 0.82 40 158 239
SNARFSDVHGCDEAKEELQELVEFLRNPEKFSNLGGKLPKGVLLVGPPGTGKTLLARAVAGEAGVPFFYMSGSEFDEIYVGVGAKRVRELFNAAKAKAPSIVFIDELDAIGGRRNSRDATYVRQTLNQLLTEMDGFAQNSGVIILGATNFPESLDKALTRPGRFDRHVHVSLPDVRGRIAILKHHAKKIKIGSDVNIAAIAARTSGLSGAELENIVNQAAVHASKEKAKAVMQAHFEWAKDKVIMG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8XCA_1)}(2) \setminus P_{f(5SLI_1)}(2)|=91\), \(|P_{f(5SLI_1)}(2) \setminus P_{f(8XCA_1)}(2)|=63\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11000000111001100110000000011111101111100010010100100001010000011001100010110001001111110000100011100000101101000101000000110011001100011100101011000010110011110100100111100100110011000000111111010001111101100101001011110110010011100001100000100010011011010001110011101001000100011111010111001001001101100010011100100111101100010000111110100100011010111010100111011100111010110101110111000110101100000010101000000000011010011110001000111000101001000001101110111000010100010111101110111111100110000110011101001010001101111000100010010000101100101000100011100101011000011110110100100000011001001000000100010100000000000110100101111100100110010000100100000110100100010000010100100100111001000110111100100100000100000001101111111100100010001111101010000000010001100011100010100000100101010111011001100000100111011001001101000000101010011111010101010001111010000110001000000110000111100000000100100011111010110100
Pair \(Z_2\) Length of longest common subsequence
8XCA_1,5SLI_1 154 5
8XCA_1,5UBV_1 179 3
5SLI_1,5UBV_1 191 4

Newick tree

 
[
	5UBV_1:97.18,
	[
		8XCA_1:77,5SLI_1:77
	]:20.18
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1431 }{\log_{20} 1431}-\frac{523}{\log_{20}523})=230.\)
Status Protein1 Protein2 d d1/2
Query variables 8XCA_1 5SLI_1 292 231.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]