CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7FTA_1 6TCR_1 7ZUL_1 Letter Amino acid
13 16 70 L Leucine
7 1 20 M Methionine
12 18 58 T Threonine
11 13 70 A Alanine
3 5 0 C Cysteine
9 7 41 Q Glutamine
16 23 72 G Glycine
25 6 44 I Isoleucine
15 35 82 S Serine
1 6 8 W Tryptophan
12 22 47 V Valine
9 9 45 N Asparagine
8 6 21 F Phenylalanine
5 12 28 P Proline
8 6 31 R Arginine
9 6 45 E Glutamic acid
6 11 10 H Histidine
15 11 53 K Lycine
0 11 38 Y Tyrosine
11 6 38 D Aspartic acid

7FTA_1|Chains A, B, C, D|Syntenin-1|Homo sapiens (9606)
>6TCR_1|Chain A[auth H]|Omalizumab Fab Ser81Arg, Gln83Arg and Leu158Pro light chain mutant|Homo sapiens (9606)
>7ZUL_1|Chain A[auth AAA]|Penicillin-binding protein 1b|Streptococcus pneumoniae R6 (171101)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7FTA , Knot 90 195 0.81 38 132 183
SMAEIKQGIREVILCKDQDGKIGLRLKSIDNGIFVQLVQANSPASLVGLRFGDQVLQINGENCAGWSSDKAHKVLKQAFGEKITMTIRDRPFERTITMHKDSTGHVGFIFKNGKITSIVKDSSAARNGLLTEHNICEINGQNVIGLKDSQIADILSTSGTVVTITIMPAFIFEHIIKRMAPSIMKSLMDHTIPEV
6TCR , Knot 98 230 0.77 40 144 212
EVQLVESGGGLVQPGGSLRLSCAVSGYSITSGYSWNWIRQAPGKGLEWVASITYDGSTNYNPSVKGRITISRDDSKNTFYLQMNSLRAEDTAVYYCARGSHYFGHWHFAVWGQGTLVTVSSASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKVEPKSCHHHHHH
7ZUL , Knot 302 821 0.82 38 277 738
MQNQLNELKRKMLEFFQQKQKNKKSARPGKKGSSTKKSKTLDKSAIFPAILLSIKALFNLLFVLGFLGGMLGAGIALGYGVALFDKVRVPQTEELVNQVKDISSISEITYSDGTVIASIESDLLRTSISSEQISENLKKAIIATEDEHFKEHKGVVPKAVIRATLGKFVGLGSSSGGSTLTQQLIKQQVVGDAPTLARKAAEIVDALALERAMNKDEILTTYLNVAPFGRNNKGQNIAGARQAAEGIFGVDASQLTVPQAAFLAGLPQSPITYSPYENTGELKSDEDLEIGLRRAKAVLYSMYRTGALSKDEYSQYKDYDLKQDFLPSGTVTGISRDYLYFTTLAEAQERMYDYLAQRDNVSAKELKNEATQKFYRDLAAKEIENGGYKITTTIDQKIHSAMQSAVADYGYLLDDGTGRVEVGNVLMDNQTGAILGFVGGRNYQENQNNHAFDTKRSPASTTKPLLAYGIAIDQGLMGSETILSNYPTNFANGNPIMYANSKGTGMMTLGEALNYSWNIPAYWTYRMLRENGVDVKGYMEKMGYEIPEYGIESLPMGGGIEVTVAQHTNGYQTLANNGVYHQKHVISKIEAADGRVVYEYQDKPVQVYSKATATIMQGLLREVLSSRVTTTFKSNLTSLNPTLANADWIGKTGTTGQDENMWLMLSTPRLTLGGWIGHDDNHSLSQQAGYSNNSNYMAHLVNAIQQASPSIWGNERFALDPSVVKSEVLKSTGQKPGKVSVEGKEVEVTGSTVTSYWANKSGAPATSYRFAIGGSDADYQNAWSSIVGSLPTPSSSSSSSSSSSDSSNSSTTRPSSSRARR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7FTA_1)}(2) \setminus P_{f(6TCR_1)}(2)|=78\), \(|P_{f(6TCR_1)}(2) \setminus P_{f(7FTA_1)}(2)|=90\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:011010011001110000010111010010011110110100110111101100110101000111000010011001110010101000110001010000010111110010100110000110011100001001010011110000110110001011010111111100110011101100110001101
Pair \(Z_2\) Length of longest common subsequence
7FTA_1,6TCR_1 168 4
7FTA_1,7ZUL_1 181 4
6TCR_1,7ZUL_1 191 4

Newick tree

 
[
	7ZUL_1:95.85,
	[
		7FTA_1:84,6TCR_1:84
	]:11.85
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{425 }{\log_{20} 425}-\frac{195}{\log_{20}195})=67.7\)
Status Protein1 Protein2 d d1/2
Query variables 7FTA_1 6TCR_1 83 78.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]