CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6NJD_1 2YCF_1 6UVX_1 Letter Amino acid
10 15 21 D Aspartic acid
12 31 16 E Glutamic acid
19 37 32 L Leucine
2 11 6 Y Tyrosine
8 21 30 V Valine
4 22 35 A Alanine
12 9 10 Q Glutamine
12 18 41 G Glycine
2 6 9 H Histidine
2 6 7 M Methionine
4 12 7 F Phenylalanine
14 16 17 T Threonine
4 9 5 N Asparagine
7 14 19 P Proline
9 20 15 S Serine
14 29 9 K Lycine
0 8 4 C Cysteine
14 20 17 I Isoleucine
0 3 8 W Tryptophan
8 15 19 R Arginine

6NJD_1|Chains A, C|Di-ubiquitin|Homo sapiens (9606)
>2YCF_1|Chain A|SERINE/THREONINE-PROTEIN KINASE CHK2|HOMO SAPIENS (9606)
>6UVX_1|Chains A, B|Phosphoenolpyruvate transferase|Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155) (246196)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6NJD , Knot 44 157 0.47 36 68 82
GPLGSMQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRSSMQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG
2YCF , Knot 140 322 0.83 40 190 309
MSVYPKALRDEYIMSKTLGSGACGEVKLAFERKTCKKVAIKIISKRKFAIGSAREADPALNVETEIEILKKLNHPCIIKIKNFFDAEDYYIVLELMEGGELFDKVVGNKRLKEATCKLYFYQMLLAVQYLHENGIIHRDLKPENVLLSSQEEDCLIKITDFGHSKILGETSLMRTLCGTPTYLAPEVLVSVGTAGYNRAVDCWSLGVILFICLSGYPPFSEHRTQVSLKDQITSGKYNFIPEVWAEVSEKALDLVKKLLVVDPKARFTTEEALRHPWLQDEDMKRKFQDLLSEENESTALPQVLAQPSTSRKRPREGEAEGA
6UVX , Knot 139 327 0.82 40 188 310
MKITVLVGGVGGARFLLGVQNLLGLGSFADGPSKHELTAVVNIGDDAWMHGVRICPDLDTCMYTLGGGIDPDRGWGHRNETWNAKEELAAYGVQPDWFGLGDRDLATHLVRSQMLRAGYPLSQVTEALCKRWQPGARLLPASDERSETHVVITDPTDGERRAIHFQEWWVRYRAKVPTHSFAYVGADQATAGPGVVEAIGDADIVLLAPSNPVVSIGPILQIPGIRGALRSTSAPVIGYSPIIAGKPLRGMADECLKVIGVESTSQAVGEFFGARAGTGLLDGWLVHEGDHAQIEGVKVKAVPLLMTDPEATAAMVRAGLDLAGVSL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6NJD_1)}(2) \setminus P_{f(2YCF_1)}(2)|=26\), \(|P_{f(2YCF_1)}(2) \setminus P_{f(6NJD_1)}(2)|=148\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1111010111001010010101010001001010100001111000011111001001001000010000010111010001011100101001010101000100101010000111100001111100100100100001000001011101011
Pair \(Z_2\) Length of longest common subsequence
6NJD_1,2YCF_1 174 3
6NJD_1,6UVX_1 176 3
2YCF_1,6UVX_1 168 3

Newick tree

 
[
	6NJD_1:88.63,
	[
		2YCF_1:84,6UVX_1:84
	]:4.63
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{479 }{\log_{20} 479}-\frac{157}{\log_{20}157})=94.8\)
Status Protein1 Protein2 d d1/2
Query variables 6NJD_1 2YCF_1 125 78
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: