CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7BPF_1 4YXP_1 6QHR_1 Letter Amino acid
0 8 13 H Histidine
0 14 18 I Isoleucine
0 5 12 M Methionine
0 3 3 W Tryptophan
0 14 19 E Glutamic acid
3 20 25 G Glycine
0 16 16 S Serine
0 14 18 T Threonine
0 27 26 L Leucine
0 6 30 K Lycine
0 7 11 Q Glutamine
0 7 10 Y Tyrosine
0 35 16 A Alanine
0 7 8 N Asparagine
2 10 8 C Cysteine
0 12 12 F Phenylalanine
0 16 15 P Proline
0 17 20 V Valine
0 23 18 R Arginine
0 14 20 D Aspartic acid

7BPF_1|Chains A, B|RNA (5'-R(*GP*CP*UP*GP*CP*(5BU)P*GP*C)-3')|synthetic construct (32630)
>4YXP_1|Chains A, B|mRNA export factor|Human herpesvirus 1 (strain 17) (10299)
>6QHR_1|Chain A|Dual specificity mitogen-activated protein kinase kinase 7|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7BPF , Knot 4 7 0.37 6 3 3
GCUGCUG
4YXP , Knot 121 275 0.82 40 173 262
GPGAADTIDATTRLVLRSISERAAVDRISESFGRSAQVMHDPFGGQPFPAANSPWAPVLAGQGGPFDAETRRVSWETLVAHGPSLYRTFAGNPRAASTAKAMRDCVLRQENFIEALASADETLAWCKMCIHHNLPLRPQDPIIGTTAAVLDNLATRLRPFLQCYLKARGLCGLDELCSRRRLADIKDIASFVFVILARLANRVERGVAEIDYATLGVGVGEKMHFYLPGACMAGLIEILDTHRQECSSRVCELTASHIVAPPYVHGKYFYCNSLF
6QHR , Knot 138 318 0.83 40 206 306
MGHHHHHHSAKQTGYLTIGGQRYQAEINDLENLGEMGSGTCGQVWKMRFRKTGHVIAVKQMRRSGNKEENKRILMDLDVVLKSHDCPYIVQCFGTFITNTDVFIAMELMGTCAEKLKKRMQGPIPERILGKMTVAIVKALYYLKEKHGVIHRDVKPSNILLDERGQIKLCDFGISGRLVDSKAKTRSAGCAAYMAPERIDPPDPTKPDYDIRADVWSLGISLVELATGQFPYKNCKTDFEVLTKVLQEEPPLLPGHMGFSGDFQSFVKDCLTKDHRKRPKYNKLLEHSFIKRYETLEVDVASWFKDVMAKTESPRTSG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7BPF_1)}(2) \setminus P_{f(4YXP_1)}(2)|=3\), \(|P_{f(4YXP_1)}(2) \setminus P_{f(7BPF_1)}(2)|=173\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1001001
Pair \(Z_2\) Length of longest common subsequence
7BPF_1,4YXP_1 176 1
7BPF_1,6QHR_1 207 2
4YXP_1,6QHR_1 169 3

Newick tree

 
[
	7BPF_1:99.61,
	[
		4YXP_1:84.5,6QHR_1:84.5
	]:15.11
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{282 }{\log_{20} 282}-\frac{7}{\log_{20}7})=94.4\)
Status Protein1 Protein2 d d1/2
Query variables 7BPF_1 4YXP_1 120 61
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: