CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5JJZ_1 1DHJ_1 8ZHY_1 Letter Amino acid
6 11 40 L Leucine
3 5 4 F Phenylalanine
5 9 15 R Arginine
3 5 5 N Asparagine
10 12 17 E Glutamic acid
7 10 32 G Glycine
10 5 20 H Histidine
4 12 21 I Isoleucine
8 6 6 K Lycine
2 11 29 V Valine
1 4 18 Q Glutamine
3 5 1 W Tryptophan
1 10 17 P Proline
5 11 25 S Serine
2 6 14 T Threonine
4 4 3 Y Tyrosine
1 13 42 A Alanine
5 13 12 D Aspartic acid
1 2 5 C Cysteine
2 5 9 M Methionine

5JJZ_1|Chain A|Chromodomain Y-like protein 2|Homo sapiens (9606)
>1DHJ_1|Chains A, B|DIHYDROFOLATE REDUCTASE|Escherichia coli (562)
>8ZHY_1|Chains A, B, C, D|4-hydroxythreonine-4-phosphate dehydrogenase|Comamonas testosteroni KF-1 (399795)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5JJZ , Knot 44 83 0.78 40 68 77
MHHHHHHSSGRENLYFQGASGDLYEVERIVDKRKNKKGKWEYLIRWKGYGSTEDTWEPEHHLLHCEEFIDEFNGLHMSKDKRI
1DHJ , Knot 79 159 0.84 40 123 153
MISLIAALAVDRVIGMENAMPWNLPASLAWFKRNTLDKPVIMGRHTWESIGRPLPGRKNIILSSQPGTDDRVTWVKSVDEAIAACGDVPEIMVIGGGRVYEQFLPKAQKLYLTHIDAEVEGDTHFPDYEPDDWESVSSEFHDADAQNSHSYCFEILERR
8ZHY , Knot 139 335 0.80 40 178 314
MGSSHHHHHHSSGLVPRGSHMTIVHRRLALAIGDPHGIGPEIALKALQQLSATERSLIKVYGPWSALEQAAQICQMESLLQDLIHEEAGSLAQPVQCGEITPQAGLSTVQSATAAIRACESGEVDAVIACPHHETAIHRAGIAFSGYPSLLANVLGMNEDEVFLMLVGAGLRIVHVTLHESVRSALERLSPQLVINAVDAAVQTCTLLGVPKPQVAVFGINPHASEGQLFGLEDSQITVPAVETLRKRGLTVDGPMGADMVLAQRKHDLYVAMLHDQGHIPIKLLAPNGASALSIGGRVVLSSVGHGSAMDIAGRGVADATALLRTIALLGAQPV

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5JJZ_1)}(2) \setminus P_{f(1DHJ_1)}(2)|=41\), \(|P_{f(1DHJ_1)}(2) \setminus P_{f(5JJZ_1)}(2)|=96\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10000000010001010110101001001100000001010011010101000001010001100001100101101000001
Pair \(Z_2\) Length of longest common subsequence
5JJZ_1,1DHJ_1 137 3
5JJZ_1,8ZHY_1 174 9
1DHJ_1,8ZHY_1 169 4

Newick tree

 
[
	8ZHY_1:90.78,
	[
		5JJZ_1:68.5,1DHJ_1:68.5
	]:22.28
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{242 }{\log_{20} 242}-\frac{83}{\log_{20}83})=51.5\)
Status Protein1 Protein2 d d1/2
Query variables 5JJZ_1 1DHJ_1 64 48
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: