CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3HXM_1 1QHA_1 2OHR_1 Letter Amino acid
17 21 7 H Histidine
115 91 29 L Leucine
19 39 20 F Phenylalanine
49 21 21 P Proline
19 60 24 T Threonine
80 59 18 R Arginine
65 78 35 G Glycine
6 34 8 M Methionine
24 49 30 S Serine
56 62 27 E Glutamic acid
21 59 17 K Lycine
15 21 19 Y Tyrosine
71 51 24 A Alanine
7 35 15 N Asparagine
17 27 16 Q Glutamine
10 53 23 I Isoleucine
16 4 7 W Tryptophan
48 71 34 V Valine
28 62 22 D Aspartic acid
2 20 6 C Cysteine

3HXM_1|Chain A|Argonaute|Thermus thermophilus (262724)
>1QHA_1|Chains A, B|PROTEIN (HEXOKINASE)|Homo sapiens (9606)
>2OHR_1|Chain A|Beta-secretase 1|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3HXM , Knot 244 685 0.77 40 212 556
MNHLGKTEVFLNRFALRPLNPEELRPWRLEVVLDPPPGREEVYPLLAQVARRAGGVTVRMGDGLASWSPPEVLVLEGTLARMGQTYAYRLYPKGRRPLDPKDPGERSVLSALARRLLQERLRRLEGVWVEGLAVYRREHARGPGWRVLGGAVLDLWVSDSGAFLLEVDPAYRILCEMSLEAWLAQGHPLPKRVRNAYDRRTWELLRLGEEDPKELPLPGGLSLLDYHASKGRLQGREGGRVAWVADPKDPRKPIPHLTGLLVPVLTLEDLHEEEGSLALSLPWEERRRRTREIASWIGRRLGLGTPEAVRAQAYRLSIPKLMGRRAVSKPADALRVGFYRAQETALALLRLDGAQGWPEFLRRALLRAFGASGASLRLHTLHAHPSQGLAFREALRKAKEEGVQAVLVLTPPMAWEDRNRLKALLLREGLPSQILNVPLREEERHRWENALLGLLAKAGLQVVALSGAYPAELAVGFDAGGRESFRFGGAACAVGGDGGHLLWTLPEAQAGERIPQEVVWDLLEETLWAFRRKAGRLPSRVLLLRDGRVPQDEFALALEALAREGIAYDLVSVRKSGGGRVYPVQGRLADGLYVPLEDKTFLLLTVHRDFRGTPRPLKLVHEAGDTPLEALAHQIFHLTRLYPASGFAFPRLPAPLHLADRLVKEVGRLGIRHLKEVDREKLFFV
1QHA , Knot 321 917 0.79 40 296 754
MIAAQLLAYYFTELKDDQVKKIDKYLYAMRLSDETLIDIMTRFRKEMKNGLSRDFNPTATVKMLPTFVRSIPDGSEKGDFIALDLGGSSFRILRVQVNHEKNQNVHMESEVYDTPENIVHGSGSQLFDHVAECLGDFMEKRKIKDKKLPVGFTFSFPCQQSKIDEAILITWTKRFKASGVEGADVVKLLNKAIKKRGDYDANIVAVVNDTVGTMMTCGYDDQHCEVGLIIGTGTNACYMEELRHIDLVEGDEGRMCINTEWGAFGDDGSLEDIRTEFDREIDRGSLNPGKQLFEKMVSGMYLGELVRLILVKMAKEGLLFEGRITPELLTRGKFNTSDVSAIEKNKEGLHNAKEILTRLGVEPSDDDCVSVQHVCTIVSFRSANLVAATLGAILNRLRDNKGTPRLRTTVGVDGSLYKTHPQYSRRFHKTLRRLVPDSDVRFLLSESGSGKGAAMVTAVAYRLAEQHRQIEETLAHFHLTKDMLLEVKKRMRAEMELGLRKQTHNNAVVKMLPSFVRRTPDGTENGDFLALDLGGTNFRVLLVKIRSGKKRTVEMHNKIYAIPIEIMQGTGEELFDHIVSCISDFLDYMGIKGPRMPLGFTFSFPCQQTSLDAGILITWTKGFKATDCVGHDVVTLLRDAIKRREEFDLDVVAVVNDTVGTMMTCAYEEPTCEVGLIVGTGSNACYMEEMKNVEMVEGDQGQMCINMEWGAFGDNGCLDDIRTHYDRLVNEYSLNAGKQRYEKMISGMYLGEIVRNILIDFTKKGFLFRGQISETLKTRGIFETKFLSQIESDRLALLQVRAILQQLGLNSTCDDSILVKTVCGVVSRRAAQLCGAGMAAVVDKIRENRGLDRLNVTVGVDGTLYKLHPHFSRIMHQTVKELSPKCNVSFLLSEDGSGKGAALITAVGVRLRTEASS
2OHR , Knot 172 402 0.85 40 230 385
RETDEEPEEPGKKGSFVEMVDNLRGKSGQGYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQRQLSSTYRDLRKGVYVPYTQGKWEGELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNLFSLQLCGAGFPLNQSEVLASVGGSMIIGGIDHSLYTGSLWYTPIRREWYYEVIIVRVEINGQDLKMDCKEYNYDKSIVDSGTTNLRLPKKVFEAAVKSIKAASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLMGEVTNQSFRITILPQQYLRPVEDVATSQDDCYKFAISQSSTGTVMGAVIMEGFYVVFDRARKRIGFAVSACHVHDEFRTAAVEGPFVTLDMEDCGYN

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3HXM_1)}(2) \setminus P_{f(1QHA_1)}(2)|=43\), \(|P_{f(1QHA_1)}(2) \setminus P_{f(3HXM_1)}(2)|=127\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1001100011100111011010010110101110111100010111101100111101011011101011011110101101100010010101001101001100011011100110001001011110111100000101111011111110111000111110101100110010101111010111001001000001011011000100111111101100010010101001101111101001001110101111111010010000101110111000000000110111001111010110101001011011100110011011011100100011111010110111011001110111101101010010101001111001100100011011111011111000001011110011100110111000000010011111110111011110110110111110111000101111101111011011101101011001100111011000111100011011001111001011000111110111001110011010001110101101011011011100001111010001010101101100110011011100110100101101111101111101100110011011100100100001111
Pair \(Z_2\) Length of longest common subsequence
3HXM_1,1QHA_1 170 4
3HXM_1,2OHR_1 160 4
1QHA_1,2OHR_1 148 6

Newick tree

 
[
	3HXM_1:85.19,
	[
		2OHR_1:74,1QHA_1:74
	]:11.19
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1602 }{\log_{20} 1602}-\frac{685}{\log_{20}685})=228.\)
Status Protein1 Protein2 d d1/2
Query variables 3HXM_1 1QHA_1 277 242.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]