CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2ECP_1 3KPV_1 4GFX_1 Letter Amino acid
41 19 14 E Glutamic acid
30 9 7 F Phenylalanine
25 17 6 S Serine
82 32 7 A Alanine
23 12 0 H Histidine
13 4 3 M Methionine
30 20 8 P Proline
31 11 8 T Threonine
20 6 2 W Tryptophan
42 20 6 R Arginine
55 14 7 D Aspartic acid
7 6 4 C Cysteine
45 14 8 Q Glutamine
84 31 10 L Leucine
52 7 13 K Lycine
26 8 7 Y Tyrosine
56 20 20 V Valine
42 7 5 N Asparagine
49 24 12 G Glycine
43 8 4 I Isoleucine

2ECP_1|Chains A, B|MALTODEXTRIN PHOSPHORYLASE|Escherichia coli (562)
>3KPV_1|Chains A, B|Phenylethanolamine N-methyltransferase|Homo sapiens (9606)
>4GFX_1|Chain A|Thioredoxin-interacting protein|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2ECP , Knot 305 796 0.85 40 304 719
SQPIFNDKQFQEALSRQWQRYGLNSAAEMTPRQWWLAVSEALAEMLRAQPFAKPVANQRHVNYISMEFLIGRLTGNNLLNLGWYQDVQDSLKAYDINLTDLLEEEIDPALGNGGLGRLAACFLDSMATVGQSATGYGLNYQYGLFRQSFVDGKQVEAPDDWHRSNYPWFRHNEALDVQVGIGGAVTKDGRWEPEFTITGQAWDLPVVGYRNGVAQPLRLWQATHAHPFDLTKFNDGDFLRAEQQGINAEKLTKVLYPNDNHTAGKKLRLMQQYFQCACSVADILRRHHLAGRELHELADYEVIQLNDTHPTIAIPELLRVLIDEHQMSWDDAWAITSKTFAYTNHTLMPEALERWDVKLVKGLLPRHMQIINEINTRFKTLVEKTWPGDEKVWAKLAVVHDKQVHMANLCVVGGFAVNGVAALHSDLVVKDLFPEYHQLWPNKFHNVTNGITPRRWIKQCNPALAALLDKSLQKEWANDLDQLINLVKLADDAKFRDLYRVIKQANKVRLAEFVKVRTGIDINPQAIFDIQIKRLHEYKRQHLNLLRILALYKEIRENPQADRVPRVFLFGAKAAPGYYLAKNIIFAINKVADVINNDPLVGDKLKVVFLPDYCVSAAEKLIPAADISEQISTAGKEASGTGNMKLALNGALTVGTLDGANVEIAEKVGEENIFIFGHTVKQVKAILAKGYDPVKWRKKDKVLDAVLKELESGKYSDGDKHAFDQMLHSIGKQGGDPYLVMADFAAYVEAQKQVDVLYRDQEAWTRAAILNTARCGMFSSDRSIRDYQARIWQAAR
3KPV , Knot 129 289 0.84 40 177 283
MSGADRSPNAGAAPDSAPGQAAVASAYQRFEPRAYLRNNYAPPRGDLCNPNGVGPWKLRCLAQTFATGEVSGRTLIDIGSGPTVYQLLSACSHFEDITMTDFLEVNRQELGRWLQEEPGAFNWSMYSQHACLIEGKGECWQDKERQLRARVKRVLPIDVHQPQPLGAGSPAPLPADALVSAFCLEAVSPDLASFQRALDHITTLLRPGGHLLLIGALEESWYLAGEARLTVVPVSEEEVREALVRSGYKVRDLRTYIMPAHLQTGVDDVKGVFFAWAQKVGLEHHHHHH
4GFX , Knot 74 151 0.82 38 118 144
VAAIKSFEVVFNDPEKVYGSGEKVAGRVIVEVCEVTRVKAVRILACGVAKVLWMQGSQQCKQTSEYLRYEDTLLLEDQPTGENEMVIMRPGNKYEYKFGFELPQGPLGTSFKGKYGCVDYWVKAFLDRPSQPTQETKKNFEVVDLVDVNTP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2ECP_1)}(2) \setminus P_{f(3KPV_1)}(2)|=158\), \(|P_{f(3KPV_1)}(2) \setminus P_{f(2ECP_1)}(2)|=31\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0011100001001100010001100110101001111100111011010111011100001001010111101010011011100010001010010100110001011110111101110110011011001010110000111000110100101100100000111000011010111111100010101010101011011111000111011011010010110100100101101000110100100110100000110010110001001001101100001110010011000110100001011110110111000010100111100001100000111011001010110111100101100100010011000111000111011110000101101011111110111110001110011100001110010010011010011000011111110001000110010011011011001010010011001001011011010011010101110101001000000010110111100010001010011011111101111001100111110011011000111100101111100010110011111010001001100101010101110111011010110101100110001111100100101111010011010000011011100100100001000110011001100110101111011101010001011000001100111100100111000001000010110110
Pair \(Z_2\) Length of longest common subsequence
2ECP_1,3KPV_1 189 3
2ECP_1,4GFX_1 230 4
3KPV_1,4GFX_1 181 3

Newick tree

 
[
	2ECP_1:10.72,
	[
		3KPV_1:90.5,4GFX_1:90.5
	]:19.22
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1085 }{\log_{20} 1085}-\frac{289}{\log_{20}289})=212.\)
Status Protein1 Protein2 d d1/2
Query variables 2ECP_1 3KPV_1 273 183
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]