CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8QNJ_1 7MDN_1 8VZK_1 Letter Amino acid
3 9 40 V Valine
6 1 16 N Asparagine
6 5 32 D Aspartic acid
4 10 21 S Serine
8 4 36 I Isoleucine
0 7 21 F Phenylalanine
0 6 35 P Proline
2 3 22 T Threonine
0 2 2 C Cysteine
4 8 24 Q Glutamine
0 3 7 H Histidine
2 3 16 M Methionine
3 5 14 Y Tyrosine
4 13 76 A Alanine
6 12 46 L Leucine
5 17 21 K Lycine
0 5 8 W Tryptophan
3 5 33 R Arginine
6 12 34 E Glutamic acid
3 10 55 G Glycine

8QNJ_1|Chain A|Lenovo Protein|Apilactobacillus kunkeei (148814)
>7MDN_1|Chains A, B, C, D, E, F, G, H|Histone-lysine N-methyltransferase NSD2|Homo sapiens (9606)
>8VZK_1|Chains A, B|Oxalyl-CoA decarboxylase|Chloroflexi bacterium HGW-Chloroflexi-9 (2013732)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8QNJ , Knot 37 65 0.79 30 59 63
GSMKKEELIDKIKANNRLLNAVVQEMYLDNSLDIKTRDYYASNITSVRQNGDQIIQILDEEGIAE
7MDN , Knot 70 140 0.82 40 107 136
GRDKDHLLKYNVGDLVWSKVSGYPWWPCMVSADPLLHSYTKLKGQKKSARQYHVQFFGDAPERAWIFEKSLVAFEGEGQFEKLCQESAKQAPTKAEKIKLLKPISGKLRAQWEMGIVQAEEAASMSVEERKAKFTFLYVG
8VZK , Knot 218 559 0.82 40 246 513
MPEGPVAEIDGQTIIARALKQQGVEAMFGVVGIPVTGIAAAAQREGIKYVGMRHEMPATYAAQAVSYLGGRLGTALAVSGPGVLNAVAAFANAWSNRWPMILIGGSYEQTGHLMGFFQEADQLSALKPYAKYAERVERLERIPIYVAEAVKKALHGVPGPAYLELPGDIITAKIDESKVEWAPRVPDPKRTLSDPADVEAAIAALKTAQQPLIIVGKGVAASRAEVEIRAFVEKTGIPYLAMPMAKGLIPDDHDQSAAAARSFVLQNADLIFLVGARLNWMLHFGLPPRFRPDVRVVQLDFNPEEIGINVPTEVGMIGDAKATLSQLLDVLDRDGWRFPDDSEWVTAVSAEARQNAEAVQAMMQEDTQPLGYYRALRSIDERLPKDAIFVAEGASTMDISRTVINQYLPRTRLDAGSFGSMGLGHGFAIGAATQFPGKRVICLQGDGAFGFAGTECEVAVRYNLPITWIVFNNGGIGGHRAELFERDQKPVGGMSLGARYDILMQGLGGAAFNATNSDELDAAIEAALKIDGPSLINVPLDPDAKRKPQKFGWLTRTNE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8QNJ_1)}(2) \setminus P_{f(7MDN_1)}(2)|=39\), \(|P_{f(7MDN_1)}(2) \setminus P_{f(8QNJ_1)}(2)|=87\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10100001100101000110111001010001010000001001001000100110110001110
Pair \(Z_2\) Length of longest common subsequence
8QNJ_1,7MDN_1 126 3
8QNJ_1,8VZK_1 209 4
7MDN_1,8VZK_1 187 3

Newick tree

 
[
	8VZK_1:10.56,
	[
		8QNJ_1:63,7MDN_1:63
	]:45.56
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{205 }{\log_{20} 205}-\frac{65}{\log_{20}65})=46.7\)
Status Protein1 Protein2 d d1/2
Query variables 8QNJ_1 7MDN_1 61 44
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: