CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7MSD_1 3NEZ_1 6KRN_1 Letter Amino acid
26 9 27 I Isoleucine
13 13 21 Y Tyrosine
19 11 51 A Alanine
20 7 16 R Arginine
21 7 31 N Asparagine
19 27 43 G Glycine
24 24 17 K Lycine
8 10 10 M Methionine
12 12 24 P Proline
30 12 58 S Serine
16 13 39 T Threonine
30 14 25 D Aspartic acid
11 7 21 Q Glutamine
15 24 38 E Glutamic acid
15 13 6 H Histidine
20 13 39 V Valine
13 1 6 C Cysteine
31 14 33 L Leucine
14 10 29 F Phenylalanine
11 3 11 W Tryptophan

7MSD_1|Chain A|Polycomb protein EED|Homo sapiens (9606)
>3NEZ_1|Chains A, B, C, D|mRojoA|Discosoma Sp. (86600)
>6KRN_1|Chain A|Mating factor alpha,GH30 Xylanase B|Saccharomyces uvarum (230603)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7MSD , Knot 160 368 0.85 40 225 359
SNAKCKYSFKCVNSLKEDHNQPLFGVQFNWHSKEGDPLVFATVGSNRVTLYECHSQGEIRLLQSYVDADADENFYTCAWTYDSNTSHPLLAVAGSRGIIRIINPITMQCIKHYVGHGNAINELKFHPRDPNLLLSVSKDHALRLWNIQTDTLVAIFGGVEGHRDEVLSADYDLLGEKIMSCGMDHSLKLWRINSKRMMNAIKESYDYNPNKTNRPFISQKIHFPDFSTRDIHRNYVDCVRWLGDLILSKSCENAIVCWKPGKMEDDIDKIKPSESNVTILGRFDYSQCDIWYMRFSTDFWQKMLALGNQVGKLYVWDLEVEDPHKAKCTTLTHHKCGAAIRQTSFSRDSSILIAVCDDASIWRWDRLR
3NEZ , Knot 110 244 0.82 40 158 231
MGHHHHHHGVSKGEEDNMAIIKEFMRFKTHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKGGPLPFAWDILSPQFMYGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGVVTVTQDSSLQDGEFIYKVKLHGTNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKLRLKLKDGGHYDAEVKTTYKAKKPVQLPGAYNANYKLDITSHNEDYTIVEQYERCEGRHSTGGMDELYK
6KRN , Knot 219 545 0.84 40 253 515
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFINTTIASIAAKEEGVSLEKREAEAYVEFQINVDLQARYQSVDGFGCSQAFQRAEDIFGKYGLSPKNQSYVLDLMYSEERGAGFTILRNGIGSSNSSTSNLMNSIEPFSPGSPSSTPNYTWDHYNSGQFPLSQQARARGLPYIYADAWSAPGYMKTNQDENWSGFLCGIEGETCPSGDWRQAYADYLVQYVKFYAESGVPVTHLGFLNEPQEVVSYASMGSNGTQAAEFVKILGQTLEREGIDIELTCCDGVGWSEQEAMIPGLQVVGPDGKSAEDYLSVVTGHGYSSAPTFPLSTKRRTWLTEWTDLSGAFTPYTFFADGGAGEGMTWANHIQTAFVNANVSAFIYWIGAENSTTNSGMINLINDEVIPSKRFWSMASFSKFVRPNAQRVKATSSDASVTVSAFENTNGVVAIQVINNGTSAASLTIDLGKTHKEVKKVVPWVTSNDYDLEEMSEIDVKHNSFLASVPARSLTSFVTECE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7MSD_1)}(2) \setminus P_{f(3NEZ_1)}(2)|=124\), \(|P_{f(3NEZ_1)}(2) \setminus P_{f(7MSD_1)}(2)|=57\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00100000100100100000011111010100001011111011000101000000101011000101010001000110000000011111110011101101101001000110101100101010010111010000110110100001111111101000011010001110011001100010110100001101100000001000001110001011010000100001001011101110000001110101101000100101000010111010000001101010001100111110011010110101001001000010000011110000100000111110001011010010
Pair \(Z_2\) Length of longest common subsequence
7MSD_1,3NEZ_1 181 3
7MSD_1,6KRN_1 186 4
3NEZ_1,6KRN_1 179 4

Newick tree

 
[
	7MSD_1:92.49,
	[
		3NEZ_1:89.5,6KRN_1:89.5
	]:2.99
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{612 }{\log_{20} 612}-\frac{244}{\log_{20}244})=103.\)
Status Protein1 Protein2 d d1/2
Query variables 7MSD_1 3NEZ_1 134 110.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]