CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5JPN_1 8VVO_1 6ECB_1 Letter Amino acid
53 22 34 G Glycine
72 17 34 L Leucine
45 12 21 P Proline
4 4 1 W Tryptophan
19 15 9 Y Tyrosine
19 8 5 N Asparagine
33 7 24 D Aspartic acid
33 9 8 Q Glutamine
56 21 19 V Valine
29 5 17 E Glutamic acid
16 5 18 H Histidine
68 30 12 S Serine
11 3 8 M Methionine
41 16 27 A Alanine
5 5 2 C Cysteine
33 11 9 K Lycine
29 18 11 T Threonine
34 6 18 R Arginine
27 4 14 I Isoleucine
29 6 12 F Phenylalanine

5JPN_1|Chain A|Complement C4-A|Homo sapiens (9606)
>8VVO_1|Chains A, C[auth B]|S1CE2 VARIANT OF FAB-EPR-1 heavy chain|Homo sapiens (9606)
>6ECB_1|Chain A|Vlm2|Streptomyces tsusimaensis (285482)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5JPN , Knot 249 656 0.82 40 258 596
KPRLLLFSPSVVHLGVPLSVGVQLQDVPRGQVVKGSVFLRNPSRNNVPCSPKVDFTLSSERDFALLSLQVPLKDAKSCGLHQLLRGPEVQLVAHSPWLKDSLSRTTNIQGINLLFSSRRGHLFLQTDQPIYNPGQRVRYRVFALDQKMRPSTDTITVMVENSHGLRVRKKEVYMPSSIFQDDFVIPDISEPGTWKISARFSDGLESNSSTQFEVKKYVLPNFEVKITPGKPYILTVPGHLDEMQLDIQARYIYGKPVQGVAYVRFGLLDEDGKKTFFRGLESQTKLVNGQSHISLSKAEFQDALEKLNMGITDLQGLRLYVAAAIIESPGGEMEEAELTSWYFVSSPFSLDLSKTKRHLVPGAPFLLQALVREMSGSPASGIPVKVSATVSSPGSVPEVQDIQQNTDGSGQVSIPIIIPQTISELQLSVSAGSPHPAIARLTVAAPPSGGPGFLSIERPDSRPPRVGDTLNLNLRAVGSGATFSHYYYMILSRGQIVFMNREPKRTLTSVSVFVDHHLAPSFYFVAFYYHGDHPVANSLRVDVQAGACEGKLELSVDGAKQYRNGESVKLHLETDSLALVALGALDTALYAAGSKSHKPLNMGKVFEAMNSYDLGCGPGGGDSALQVFQAAGLAFSDGDQWTLSRKRLSCPKEKTT
8VVO , Knot 99 224 0.79 40 145 216
EVQLVESGGGLVQPGGSLRLSCAASGFNLRSYYMHWVRQAPGKGLEWVASISPYYSYTYYADSVKGRFTISADTSKNTAYLQMNSLRAEDTAVYYCARHGYGAMDYWGQGTLVTVFNQIQGPSVFPLAPSSKSTSGGTAALGCLVKDYFPGPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKVEPKSCDKTHT
6ECB , Knot 127 303 0.79 40 175 277
MHHHHHHHHENLYFQGGSSTAGDPTAKLVRLNPRGGDGPGIVFAPPAGGTVLGYIELARHLKGFGEIHGVEAPGLGAGETPVYPSFEEMVQFCSDSAAGVAGDGVYIGGHSLGGHIAFYLATMLLDRGIRPKGLIILDTPPRLGDIPVADADLTEEETKVFILAMGIGGMLDQDRDALKDLPYEEAKQLLLDRAKNDPRVSAFLSEDYLDRFLRLQMHQLMYSRDVVLPQRKLDIPIHVFRTKNHAPEVARLFSAWENYAAGEVTFVDIPGDHATMLRAPHVSEVAQLLDRHCGLPSDDGPRG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5JPN_1)}(2) \setminus P_{f(8VVO_1)}(2)|=147\), \(|P_{f(8VVO_1)}(2) \setminus P_{f(5JPN_1)}(2)|=34\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:01011110101101111101110100110101101011100100001100101010100000111101011100100011001101101011100111000100000101101110000101110000110011001000111100010100001011100001101000010110011000111101001101010101001100000001010001110101010110101101110100101010100101011011101011110001000110110000011010001010010100110010111001011010111111001110100101001011001101010000001111111110111001010110111101010100110110100100000101010111111001001010101101011110101111101111110100100011011001010101110110100000111001011110001000100101110001110101111000100111001010101110010101010110000010010101000011111111100110111000001101101101100001101111100110110111111001001010000100100000
Pair \(Z_2\) Length of longest common subsequence
5JPN_1,8VVO_1 181 5
5JPN_1,6ECB_1 157 4
8VVO_1,6ECB_1 174 3

Newick tree

 
[
	8VVO_1:91.93,
	[
		5JPN_1:78.5,6ECB_1:78.5
	]:13.43
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{880 }{\log_{20} 880}-\frac{224}{\log_{20}224})=180.\)
Status Protein1 Protein2 d d1/2
Query variables 5JPN_1 8VVO_1 226 150
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]