CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8FWC_1 4KBB_1 3SGL_1 Letter Amino acid
8 34 14 Y Tyrosine
14 7 74 A Alanine
3 14 55 Q Glutamine
3 9 26 H Histidine
11 53 24 I Isoleucine
15 16 41 T Threonine
11 46 24 N Asparagine
7 4 14 C Cysteine
11 29 76 L Leucine
13 26 33 F Phenylalanine
2 10 16 W Tryptophan
11 27 34 D Aspartic acid
13 34 30 E Glutamic acid
2 7 13 M Methionine
21 37 34 S Serine
27 15 31 V Valine
10 19 43 R Arginine
25 22 52 G Glycine
5 42 15 K Lycine
18 8 40 P Proline

8FWC_1|Chains AA[auth 0A], AB[auth 0B], A[auth 0], BA[auth 0D], BB[auth 0E], B[auth 0C], CA[auth 0G], CB[auth 1], C[auth 0F], DA[auth 3], DB[auth 4], D[auth 2], EA[auth 6], EB[auth 7], E[auth 5], FA[auth 9], FB[auth J], F[auth 8], GA[auth L], GB[auth M], G[auth K], HA[auth O], HB[auth P], H[auth N], IA[auth R], I[auth Q], JA[auth T], J[auth S], KA[auth V], K[auth U], LA[auth X], L[auth W], MA[auth Z], M[auth Y], NA[auth b], N[auth a], OA[auth d], O[auth c], PA[auth f], P[auth e], QA[auth h], Q[auth g], RA[auth j], R[auth i], SA[auth l], S[auth k], TA[auth n], T[auth m], UA[auth p], U[auth o], VA[auth r], V[auth q], WA[auth t], W[auth s], XA[auth v], X[auth u], YA[auth x], Y[auth w], ZA[auth z], Z[auth y]|Collar sheath protein, gp13|Agrobacterium phage Milano (2557550)
>4KBB_1|Chains A, B|Botulinum neurotoxin type B|Clostridium botulinum (1491)
>3SGL_1|Chain A|tRNA 5-methylaminomethyl-2-thiouridine biosynthesis bifunctional protein mnmC|Yersinia pestis (632)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8FWC , Knot 106 230 0.83 40 148 215
MYFFSVDPRNGASKSGDVCGSCCCESISARPGEVNGVMVSYAAWSAPLRGHGLTNKTTFEIDGVSVTPPKVSNAFGRTKVGVVFEGTLSDLFPNPEGEQVEYEISELNGPSNGVVELGANGAFTYTPGALFTGVDRFWFSINGNIGEYVISVDPTTSELPQPPFTTPVYVPAARRSVDPRTHVLKFVLGVSPAAIPGDVYRLTVRQVAIDCDGNEFVHISCYDISIGSCG
4KBB , Knot 180 459 0.80 40 215 421
MGSSHHHHHHSSGLVPRGSHMASMNSEILNNIILNLRYKDNNLIDLSGYGAKVEVYDGVELNDKNQFKLTSSANSKIRVTQNQNIIFNSVFLDFSVSFWIRIPKYKNDGIQNYIHNEYTIINCMKNNSGWKISIRGNRIIWTLIDINGKTKSVFFEYNIREDISEYINRWFFVTITNNLNNAKIYINGKLESNTDIKDIREVIANGEIIFKLDGDIDRTQFIWMKYFSIFNTELSQSNIEERYKIQSYSEYLKDFWGNPLMYNKEYYMFNAGNKNSYIKLKKDSPVGEILTRSKYNQNSKYINYRDLYIGEKFIIRRKSNSQSINDDIVRKEDYIYLDFFNLNQEWRVYTYKYFKKEEEKLFLAPISDSDEFYNTIQIKEYDEQPTYSCQLLFKKDEESTDEIGLIGIHRFYESGIVFEEYKDYFCISKWYLKEVKRKPYNLKLGCNWQFIPKDEGWTE
3SGL , Knot 266 689 0.84 40 283 631
MNQRPIQTATLSWNEQGTPVSEQFGDIYFSNEDGLEETHHVFLKGNGFPARFASHPQQSCIFAETGFGTGLNFLTLWRDFALFRQQSPNATLRRLHYISFEKYPLHVADLASAHARWPELASFAEQLRAQWPLPLAGCHRILLADGAITLDLWFGDVNTLLPTLDDSLNNQVDAWFLDGFAPAKNPDMWNEQLFNAMARMTRPGGTFSTFTAAGFVRRGLQQAGFNVTKVKGFGQKREMLTGTLPQQIHAPTAPWYHRPAATRCDDIAIIGGGIVSALTALALQRRGAVVTLYCADAQPAQGASGNRQGALYPLLNGKNDALETFFTSAFTFARRQYDQLLEQGIAFDHQWCGVSQLAFDDKSRGKIEKMLHTQWPVEFAEAMSREQLSELAGLDCAHDGIHYPAGGWLCPSDLTHALMMLAQQNGMTCHYQHELQRLKRIDSQWQLTFGQSQAAKHHATVILATGHRLPEWEQTHHLPLSAVRGQVSHIPTTPVLSQLQQVLCYDGYLTPVNPANQHHCIGASYQRGDIATDFRLTEQQENRERLLRCLPQVSWPQQVDVSDNQARCGVRCAIRDHLPMVGAVPDYAATLAQYQDLSRRIQHGGESEVNDIAVAPVWPELFMVGGLGSRGLCSAPLVAEILAAQMFGEPLPLDAKTLAALNPNRFWIRKLLKGRPVQTRSPATQESSR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8FWC_1)}(2) \setminus P_{f(4KBB_1)}(2)|=56\), \(|P_{f(4KBB_1)}(2) \setminus P_{f(8FWC_1)}(2)|=123\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10110101001100010101000000101011010111100111011101011000001010110101101001110001111101010011101010010001001011001110111011100011111011001110101011001101010000110111001101111000101000110111110111111010010100111000100110100001011001
Pair \(Z_2\) Length of longest common subsequence
8FWC_1,4KBB_1 179 3
8FWC_1,3SGL_1 199 4
4KBB_1,3SGL_1 190 4

Newick tree

 
[
	3SGL_1:99.73,
	[
		8FWC_1:89.5,4KBB_1:89.5
	]:10.23
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{689 }{\log_{20} 689}-\frac{230}{\log_{20}230})=128.\)
Status Protein1 Protein2 d d1/2
Query variables 8FWC_1 4KBB_1 164 121.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]