CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4APZ_1 4BHW_1 3VNV_1 Letter Amino acid
13 42 97 E Glutamic acid
6 15 28 M Methionine
5 39 54 P Proline
5 25 36 Y Tyrosine
7 34 75 R Arginine
11 37 76 D Aspartic acid
5 28 29 Q Glutamine
12 29 113 G Glycine
2 16 32 H Histidine
13 44 75 K Lycine
2 8 8 W Tryptophan
4 14 41 N Asparagine
11 26 85 I Isoleucine
9 48 104 L Leucine
6 32 52 F Phenylalanine
7 27 116 A Alanine
6 36 80 S Serine
9 26 71 T Threonine
10 32 97 V Valine
1 20 20 C Cysteine

4APZ_1|Chains AA[auth 2], A[auth 1], BA[auth 4], B[auth 3], CA[auth 6], C[auth 5], DA[auth 8], D[auth 7], EA[auth A], E[auth 9], FA[auth C], F[auth B], GA[auth E], G[auth D], HA[auth G], H[auth F], IA[auth I], I[auth H], J, JA[auth K], KA[auth M], K[auth L], LA[auth O], L[auth N], MA[auth Q], M[auth P], NA[auth S], N[auth R], OA[auth U], O[auth T], PA[auth W], P[auth V], QA[auth Y], Q[auth X], RA[auth a], R[auth Z], SA[auth c], S[auth b], TA[auth e], T[auth d], UA[auth g], U[auth f], VA[auth i], V[auth h], W[auth j], X[auth k], Y[auth l], Z[auth m]|PROBABLE DEOXYURIDINE 5'-TRIPHOSPHATE NUCLEOTIDOHYDROLASE YNCF|BACILLUS SUBTILIS (1423)
>4BHW_1|Chains A, B|HISTONE ACETYLTRANSFERASE P300|HOMO SAPIENS (9606)
>3VNV_1|Chain A|Elongation factor Ts, Elongation factor Tu, LINKER, Q beta replicase|Escherichia coli O157:H7 (83334)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4APZ , Knot 68 144 0.78 40 112 139
MTMQIKIKYLDETQTRISKIEQGDWIDLRAAEDVTIKKDEFKLVPLGVAMELPEGYEAHVVPRSSTYKNFGVIQTNSMGVIDESYKGDNDFWFFPAYALRDTEIKKGDRICQFRIMKKMPAVELVEVEHLGNEDRGGLGSTGTK
4BHW , Knot 239 578 0.87 40 288 547
GAMAGKAVPMQSKKKIFKPEELRQALMPTLEALYRQDPESLPFRQPVDPQLLGIPDYFDIVKSPMDLSTIKRKLDTGQYQEPWQYVDDIWLMFNNAWLYNRKTSRVYKYCSKLSEVFEQEIDPVMQSLGYCCGRKLEFSPQTLCCYGKQLCTIPRDATYYSYQNRYHFCEKCFNEIQGESVSLGDDPSQPQTTINKEQFSKRKNDTLDPELFVECTECGRKMHQICVLHHEIIWPAGFVCDGCLKKSARTRKENKFSAKRLPSTRLGTFLENRVNDFLRRQNHPESGEVTVRVVHASDKTVEVKPGMKARFVDSGEMAESFPYRTKALFAFEEIDGVDLCFFGMHVQEYGSDCPPPNQRRVYISYLDSVHFFRPKCLRTAVYHEILIGYLEYVKKLGYTTGHIWACPPSEGDDYIFHCHPPDQKIPKPKRLQEWFKKMLDKAVSERIVHDYKDIFKQATEDRLTSAKELPYFEGDFWPNVLEESIKESGGSGSQKLYATMEKHKEVFFVIRLIAGPAANSLPPIVDPDPLIPCDLMDGRDAFLTLARDKHLEFSSLRRAQWSTMCMLVELHTQSQDRF
3VNV , Knot 456 1289 0.84 40 336 1104
MAEITASLVKELRERTGAGMMDCKKALTEANGDIELAIENMRKSGAIKAAKKAGNVAADGVIKTKIDGNYGIILEVNCQTDFVAKDAGFQAFADKVLDAAVAGKITDVEVLKAQFEEERVALVAKIGENINIRRVAALEGDVLGSYQHGARIGVLVAAKGADEELVKHIAMHVAASKPEFIKPEDVSAEVVEKEYQVQLDIAMQSGKPKEIAEKMVEGRMKKFTGEVSLTGQPFVMEPSKTVGQLLKEHNAEVTGFIRFEVGEGIEKVETDFAAEVAAMSKQSHMSKEKFERTKPHVNVGTIGHVDHGKTTLTAAITTVLAKTYGGAARAFDQIDNAPEEKARGITINTSHVEYDTPTRHYAHVDCPGHADYVKNMITGAAQMDGAILVVAATDGPMPQTREHILLGRQVGVPYIIVFLNKCDMVDDEELLELVEMEVRELLSQYDFPGDDTPIVRGSALKALEGDAEWEAKILELAGFLDSYIPEPERAIDKPFLLPIEDVFSISGRGTVVTGRVERGIIKVGEEVEIVGIKETQKSTCTGVEMFRKLLDEGRAGENVGVLLRGIKREEIERGQVLAKPGTIKPHTKFESEVYILSKDEGGRHTPFFKGYRPQFYFRTTDVTGTIELPEGVEMVMPGDNIKMVVTLIHPIAMDDGLRFAIREGGRTVGAGVVAKVLSGASGAAGGGGSGGGGSMSKTASSRNSLSAQLRRAANTRIEVEGNLALSIANDLLLAYGQSPFNSEAECISFSPRFDGTPDDFRINYLKAEIMSKYDDFSLGIDTEAVAWEKFLAAEAECALTNARLYRPDYSEDFNFSLGESCIHMARRKIAKLIGDVPSVEGMLRHCRFSGGATTTNNRSYGHPSFKFALPQACTPRALKYVLALRASTHFDIRISDISPFNKAVTVPKNSKTDRCIAIEPGWNMFFQLGIGGILRDRLRCWGIDLNDQTINQRRAHEGSVTNNLATVDLSAASDSISLALCELLLPPGWFEVLMDLRSPKGRLPDGSVVTYEKISSMGNGYTFELESLIFASLARSVCEILDLDSSEVTVYGDDIILPSCAVPALREVFKYVGFTTNTKKTFSEGPFRESCGKHYYSGVDVTPFYIRHRIVSPADLILVLNNLYRWATIDGVWDPRAHSVYLKYRKLLPKQLQRNTIPDGYGDGALVGSVLINPFAKNRGWIRYVPVITDHTRDRERAELGSYLYDLFSRCLSESNDGLPLRGPSGCDSADLFAIDQLICRSNPTKISRSTGKFDIQYIACSSRVLAPYGVFQGTKVASLHEAHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4APZ_1)}(2) \setminus P_{f(4BHW_1)}(2)|=24\), \(|P_{f(4BHW_1)}(2) \setminus P_{f(4APZ_1)}(2)|=200\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101010100100000010010010110101100101000010111111110110100101110000000111100001111000001000111111011000010010010010110011110110100110000111100100
Pair \(Z_2\) Length of longest common subsequence
4APZ_1,4BHW_1 224 3
4APZ_1,3VNV_1 242 5
4BHW_1,3VNV_1 120 4

Newick tree

 
[
	4APZ_1:13.08,
	[
		4BHW_1:60,3VNV_1:60
	]:70.08
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{722 }{\log_{20} 722}-\frac{144}{\log_{20}144})=164.\)
Status Protein1 Protein2 d d1/2
Query variables 4APZ_1 4BHW_1 214 132
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]