CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8TZS_1 7GDD_1 7WCN_1 Letter Amino acid
31 17 28 A Alanine
19 21 24 N Asparagine
16 14 21 Q Glutamine
31 9 32 E Glutamic acid
25 16 17 S Serine
29 24 17 T Threonine
16 11 14 Y Tyrosine
18 17 31 D Aspartic acid
45 29 30 L Leucine
43 17 20 F Phenylalanine
22 13 14 P Proline
16 3 4 W Tryptophan
38 27 26 V Valine
21 11 28 R Arginine
17 7 7 H Histidine
52 11 23 I Isoleucine
27 11 26 K Lycine
26 10 7 M Methionine
12 12 8 C Cysteine
37 26 17 G Glycine

8TZS_1|Chain A|Protein wntless homolog|Homo sapiens (9606)
>7GDD_1|Chain A|3C-like proteinase|Severe acute respiratory syndrome coronavirus 2 (2697049)
>7WCN_1|Chain A|Guanine nucleotide-binding protein G(s) subunit alpha isoforms short|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8TZS , Knot 217 541 0.84 40 272 522
MAGAIIENMSTKKLCIVGGILLVFQIIAFLVGGLIAPGPTTAVSYMSVKCVDARKNHHKTKWFVPWGPNHCDKIRDIEEAIPREIEANDIVFSVHIPLPHMEMSPWFQFMLFILQLDIAFKLNNQIRENAEVSMDVSLAYRDDAFAEWTEMAHERVPRKLKCTFTSPKTPEHEGRYYECDVLPFMEIGSVAHKFYLLNIRLPVNEKKKINVGIGEIKDIRLVGIHQNGGFTKVWFAMKTFLTPSIFIIMVWYWRRITMMSRPPVLLEKVIFALGISMTFINIPVEWFSIGFDWTWMLLFGDIRQGIFYAMLLSFWIIFCGEHMMDQHERNHIAGYWKQVGPIAVGSFCLFIFDMCERGVQLTNPFYSIWTTDIGTELAMAFIIVAGICLCLYFLFLCFMVFQVFRNISGKQSSLPAMSKVRRLHYEGLIFRFKFLMLITLACAAMTVIFFIVSQVTEGHWKWGGVTVQVNSAFFTGIYGMWNLYVFALMFLYAPSHKNYGEDQSNGDLGVHSGEELQLTTTITHVDGPTEIYKLTRKEAQE
7GDD , Knot 134 306 0.83 40 196 290
SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIRKSNHNFLVQAGNVQLRVIGHSMQNCVLKLKVDTANPKTPKYKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNFTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGNFYGPFVDRQTAQAAGTDTTITVNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAVLDMCASLKELLQNGMNGRTILGSALLEDEFTPFDVVRQCSGVTFQ
7WCN , Knot 169 394 0.85 40 227 378
MGCLGNSKTEDQRNEEKAQREANKKIEKQLQKDKQVYRATHRLLLLGAGESGKNTIVKQMRILHVNGFNGEGGEEDPQAARSNSDGEKATKVQDIKNNLKEAIETIVAAMSNLVPPVELANPENQFRVDYILSVMNVPDFDFPPEFYEHAKALWEDEGVRACYERSNEYQLIDCAQYFLDKIDVIKQADYVPSDQDLLRCRVLTSGIFETKFQVDKVNFHMFDVGAQRDERRKWIQCFNDVTAIIFVVASSSYNMVIREDNQTNRLQAALKLFDSIWNNKWLRDTSVILFLNKQDLLAEKVLAGKSKIEDYFPEFARYTTPEDATPEPGEDPRVTRAKYFIRDEFLRISTASGDGRHYCYPHFTCAVDTENIRRVFNDCRDIIQRMHLRQYELL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8TZS_1)}(2) \setminus P_{f(7GDD_1)}(2)|=132\), \(|P_{f(7GDD_1)}(2) \setminus P_{f(8TZS_1)}(2)|=56\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1111110010000101111111110111111111111110011001010010100000000111111100000100100111001010011101011110101011101111110101110100010001010101011000011101001100011001000100100100010000001111101101100101101011100000101111010010111100011100111110011010111111101001011001111100111111101011011101101110101111110100111011110111110100110000000111010011111110101111010001101001100110001100111111111110101011110111101100101000011110010010001111010111110110111011111100100101011110101001110110111010111111101100000100000101110010010100010010110010010000100
Pair \(Z_2\) Length of longest common subsequence
8TZS_1,7GDD_1 188 4
8TZS_1,7WCN_1 189 3
7GDD_1,7WCN_1 181 3

Newick tree

 
[
	8TZS_1:95.46,
	[
		7GDD_1:90.5,7WCN_1:90.5
	]:4.96
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{847 }{\log_{20} 847}-\frac{306}{\log_{20}306})=147.\)
Status Protein1 Protein2 d d1/2
Query variables 8TZS_1 7GDD_1 191 146.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]