CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6LWU_1 9EBQ_1 1WWX_1 Letter Amino acid
6 26 9 K Lycine
10 17 12 S Serine
13 17 2 T Threonine
4 4 4 W Tryptophan
7 24 3 N Asparagine
19 31 5 D Aspartic acid
29 30 11 L Leucine
14 14 2 P Proline
5 7 2 M Methionine
7 20 3 F Phenylalanine
23 28 4 A Alanine
0 8 1 C Cysteine
18 21 3 Q Glutamine
15 32 10 E Glutamic acid
19 17 11 G Glycine
16 23 3 I Isoleucine
5 14 5 Y Tyrosine
18 28 10 R Arginine
3 7 2 H Histidine
12 26 5 V Valine

6LWU_1|Chain A|chitosanase|Gynuella sunshinyii YC6258 (1445510)
>9EBQ_1|Chain A|Guanine nucleotide-binding protein G(s) subunit alpha isoforms short|Homo sapiens (9606)
>1WWX_1|Chain A|E74-like factor 5 ESE-2b|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6LWU , Knot 111 243 0.83 38 159 235
AQLTAQQRLLADQIISIFANNTPELQYGYAEVLDDGRGITAGRAGFTSATGDMLEVIQRYSRLRPDNILVPFLPRLQQLAASEDGSIEGLQGLPQRWADASQNPVFRQVQDDVVDELYFQPAMERAAELGAQMPLTLLALYDAIIQHGEGDDGDGLPAMIARTTAKVNGIPAEGVDERRWLKTFLKIRKQVLRHPANLETEDEWSESTGRVDSLMKLLKQGNTDLHPPIRISTWGDVFILPIR
9EBQ , Knot 169 394 0.85 40 227 378
MGCLGNSKTEDQRNEEKAQREANKKIEKQLQKDKQVYRATHRLLLLGAGESGKNTIVKQMRILHVNGFNGEGGEEDPQAARSNSDGEKATKVQDIKNNLKEAIETIVAAMSNLVPPVELANPENQFRVDYILSVMNVPDFDFPPEFYEHAKALWEDEGVRACYERSNEYQLIDCAQYFLDKIDVIKQADYVPSDQDLLRCRVLTSGIFETKFQVDKVNFHMFDVGAQRDERRKWIQCFNDVTAIIFVVASSSYNMVIREDNQTNRLQAALKLFDSIWNNKWLRDTSVILFLNKQDLLAEKVLAGKSKIEDYFPEFARYTTPEDATPEPGEDPRVTRAKYFIRDEFLRISTASGDGRHYCYPHFTCAVDTENIRRVFNDCRDIIQRMHLRQYELL
1WWX , Knot 55 107 0.80 40 81 97
GSSGSSGSSHLWEFVRDLLLSPEENCGILEWEDREQGIFRVVKSEALAKMWGQRKKNDRMTYEKLSRALRYYYKTGILERVDRRLVYKFGKNAHGWQEDKLSGPSSG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6LWU_1)}(2) \setminus P_{f(9EBQ_1)}(2)|=52\), \(|P_{f(9EBQ_1)}(2) \setminus P_{f(6LWU_1)}(2)|=120\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101010001110011011100010100101011001011011011100101011011000001010011111110100111000101011011100110100011100100011001010111001101110111011110011100101001011111110001010111101100001100110100011001101000001000010100110110010001011101001101111110
Pair \(Z_2\) Length of longest common subsequence
6LWU_1,9EBQ_1 172 4
6LWU_1,1WWX_1 172 3
9EBQ_1,1WWX_1 204 3

Newick tree

 
[
	1WWX_1:96.96,
	[
		6LWU_1:86,9EBQ_1:86
	]:10.96
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{637 }{\log_{20} 637}-\frac{243}{\log_{20}243})=110.\)
Status Protein1 Protein2 d d1/2
Query variables 6LWU_1 9EBQ_1 143 113.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]