CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2VWZ_1 8QXA_1 8TXO_1 Letter Amino acid
4 5 8 H Histidine
19 14 24 I Isoleucine
13 20 16 K Lycine
13 22 4 F Phenylalanine
24 41 17 S Serine
10 15 19 T Threonine
14 28 9 N Asparagine
21 55 20 G Glycine
7 8 5 Y Tyrosine
22 26 23 A Alanine
24 22 36 E Glutamic acid
5 6 4 C Cysteine
9 24 10 Q Glutamine
12 18 5 M Methionine
22 20 23 R Arginine
15 22 21 D Aspartic acid
5 6 1 W Tryptophan
21 25 30 V Valine
25 21 38 L Leucine
17 16 16 P Proline

2VWZ_1|Chain A|EPHRIN TYPE-B RECEPTOR 4|HOMO SAPIENS (9606)
>8QXA_1|Chains A, B, C, D, E, F, G, H, I, J, K, L|TAR DNA-binding protein 43|Homo sapiens (9606)
>8TXO_1|Chains A, B[auth G]|DNA-directed RNA polymerase subunit alpha|Escherichia coli (562)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2VWZ , Knot 133 302 0.83 40 192 293
DPNEAVREFAKEIDVSYVKIEEVIGAGEFGEVCRGRLKAPGKKESCVAIKTLKGGYTERQRREFLSEASIMGQFEHPNIIRLEGVVTNSMPVMILTEFMENGALDSFLRLNDGQFTVIQLVGMLRGIASGMRYLAEMSYVHRDLAARNILVNSNLVCKVSDFGLSRFLEENSSDPTETSSLGGKIPIRWTAPEAIAFRKFTSASDAWSYGIVMWEVMSFGERPYWDMSNQDVINAIEQDYRLPPPPDCPTSLHQLMLDCWQKDRNARPRFPQVVSALDKMIRNPASLKIVARENGGASHPLL
8QXA , Knot 169 414 0.82 40 225 381
MSEYIRVTEDENDEPIEIPSEDDGTVLLSTVTAQFPGACGLRYRNPVSQCMRGVRLVEGILHAPDAGWGNLVYVVNYPKDNKRKMDETDASSAVKVKRAVQKTSDLIVLGLPWKTTEQDLKEYFSTFGEVLMVQVKKDLKTGHSKGFGFVRFTEYETQVKVMSQRHMIDGRWCDCKLPNSKQSQDEPLRSRKVFVGRCTEDMTEDELREFFSQYGDVMDVFIPKPFRAFAFVTFADDQIAQSLCGEDLIIKGISVHISNAEPKHNSNRQLERSGRFGGNPGGFGNQGGFGNSRGGGAGLGNNQGSNMGGGMNFGAFSINPAMMAAAQAALQSSWGMMGMLASQQNQSGPSGNNQNQGNMQREPNQAFGSGNNSYSGSNSGAAIGWGSASNAGSGSGFNGGFGSSMDSKSSGWGM
8TXO , Knot 146 329 0.85 40 180 315
MQGSVTEFLKPRLVDIEQVSSTHAKVTLEPLERGFGHTLGNALRRILLSSMPGCAVTEVEIDGVLHEYSTKEGVQEDILEILLNLKGLAVRVQGKDEVILTLNKSGIGPVTAADITHDGDVEIVKPQHVICHLTDENASISMRIKVQRGRGYVPASTRIHSEEDERPIGRLLVDACYSPVERIAYNVEAARVEQRTDLDKLVIEMETNGTIDPEEAIRRAATILAEQLEAFVDLRDVRQPEVKEEKPEFDPILLRPVDDLELTVRSANCLKAEAIHYIGDLVQRTEVELLKTPNLGKKSLTEIKDVLASRGLSLGMRLENWPPASIADE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2VWZ_1)}(2) \setminus P_{f(8QXA_1)}(2)|=64\), \(|P_{f(8QXA_1)}(2) \setminus P_{f(2VWZ_1)}(2)|=97\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:01001100110010100101001111101101001010111000001110010110000000011001011101001011010111000111111001100111001101001010110111110111011001101001000111001110001100100111001100000010000011101110101101111001001001100111110110110010101000011011000001111100100100111001000001010110110110011001101011100011100111
Pair \(Z_2\) Length of longest common subsequence
2VWZ_1,8QXA_1 161 4
2VWZ_1,8TXO_1 160 3
8QXA_1,8TXO_1 173 3

Newick tree

 
[
	8QXA_1:84.70,
	[
		2VWZ_1:80,8TXO_1:80
	]:4.70
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{716 }{\log_{20} 716}-\frac{302}{\log_{20}302})=114.\)
Status Protein1 Protein2 d d1/2
Query variables 2VWZ_1 8QXA_1 143 123.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]