CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7TYF_1 5XCT_1 6WOL_1 Letter Amino acid
21 12 10 Q Glutamine
30 12 15 L Leucine
18 12 22 S Serine
27 11 4 A Alanine
17 13 10 G Glycine
7 5 3 M Methionine
20 7 8 F Phenylalanine
14 7 15 P Proline
4 5 4 W Tryptophan
14 7 7 Y Tyrosine
26 9 23 V Valine
8 3 6 C Cysteine
23 10 4 I Isoleucine
17 11 12 T Threonine
7 1 4 H Histidine
24 4 10 N Asparagine
31 6 9 D Aspartic acid
32 13 18 E Glutamic acid
26 13 17 K Lycine
28 7 8 R Arginine

7TYF_1|Chain A|Guanine nucleotide-binding protein G(s) subunit alpha isoforms short|Homo sapiens (9606)
>5XCT_1|Chain A|VH(S112C)-SARAH chimera|Mus musculus (10090)
>6WOL_1|Chain A[auth H]|Immunoglobulin heavy constant gamma 4|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7TYF , Knot 169 394 0.85 40 227 378
MGCLGNSKTEDQRNEEKAQREANKKIEKQLQKDKQVYRATHRLLLLGAGESGKNTIVKQMRILHVNGFNGEGGEEDPQAARSNSDGEKATKVQDIKNNLKEAIETIVAAMSNLVPPVELANPENQFRVDYILSVMNVPDFDFPPEFYEHAKALWEDEGVRACYERSNEYQLIDCAQYFLDKIDVIKQADYVPSDQDLLRCRVLTSGIFETKFQVDKVNFHMFDVGAQRDERRKWIQCFNDVTAIIFVVASSSYNMVIREDNQTNRLQAALKLFDSIWNNKWLRDTSVILFLNKQDLLAEKVLAGKSKIEDYFPEFARYTTPEDATPEPGEDPRVTRAKYFIRDEFLRISTASGDGRHYCYPHFTCSVDTENIRRVFNDCRDIIQRMHLRQYELL
5XCT , Knot 78 168 0.79 40 134 165
MQIQLVQSGPEVQKPGETVRISCKASGYTFTTAGMQWVQKMPGKSLKWIGWINTRSGVPKYAEDFKGRFAFSLETSASIAYLHINNLKNEDTATYFCAREGPGFVYWGQGTLVTVCSGSDYEFLKSWTVEDLQKRLLALDPMMEQEIEEIRQKYQSKRQPILDAIEAK
6WOL , Knot 96 209 0.81 40 146 198
GPSVFLFPPKPKDTLMISRTPEVTCVVVDVSQEDPEVQFNWYVDGVEVHNAKTKPREEQFNSTYRVVSVLTVLHQDWLNGKEYKCKVSNKGLPSSIEKTISKAKGQPREPQVYTFPPEQEEMTKNQVSLRCLVKGFYPSDIAVEWESNGQPENNYKTTKPVLDSDGSFRLESRLTVDKSRWQEGNVFSCSVMHEACSYHLCKSLSLSLG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7TYF_1)}(2) \setminus P_{f(5XCT_1)}(2)|=147\), \(|P_{f(5XCT_1)}(2) \setminus P_{f(7TYF_1)}(2)|=54\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1101100000000000010001000100010000010010001111111001000110010110101101011000101100000100100100100010011001111100111110110100010100110110110101110100010111000110100000000011001001100101100100110000110001100111000101001010110111000000011001001011111110000011100000000101110110011000110000111110000111001111000100011011000010010101100101001001100011010010101000001010001000010011000001100101000011
Pair \(Z_2\) Length of longest common subsequence
7TYF_1,5XCT_1 201 4
7TYF_1,6WOL_1 179 4
5XCT_1,6WOL_1 174 4

Newick tree

 
[
	7TYF_1:97.72,
	[
		6WOL_1:87,5XCT_1:87
	]:10.72
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{562 }{\log_{20} 562}-\frac{168}{\log_{20}168})=114.\)
Status Protein1 Protein2 d d1/2
Query variables 7TYF_1 5XCT_1 148 105
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]