CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4FCF_1 3JUC_1 5UOC_1 Letter Amino acid
16 7 18 D Aspartic acid
13 2 17 I Isoleucine
31 17 42 L Leucine
6 1 12 K Lycine
4 6 17 F Phenylalanine
16 11 26 V Valine
7 4 15 N Asparagine
15 12 27 E Glutamic acid
11 15 43 P Proline
16 6 28 S Serine
5 2 12 W Tryptophan
3 7 14 Y Tyrosine
33 14 39 A Alanine
24 16 29 R Arginine
2 2 10 C Cysteine
14 5 26 Q Glutamine
21 12 28 G Glycine
3 7 11 H Histidine
9 2 6 M Methionine
16 5 20 T Threonine

4FCF_1|Chain A|Beta-lactamase SHV-1|Klebsiella pneumoniae (573)
>3JUC_1|Chain A|AIG2-like domain-containing protein 1|Homo sapiens (9606)
>5UOC_1|Chains A, B, C, D|Nitric oxide synthase, endothelial|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4FCF , Knot 115 265 0.80 40 155 252
SPQPLEQIKLSESQLSGRVGMIEMDLASGRTLTAWRADERFPMMSTFKVVLCGAVLARVDAGDEQLERKIHYRQQDLVDYSPVSEKHLADGMTVGELCAAAITMSDNSAANLLLATVGGPAGLTAFLRQIGDNVTRLDRWETELNEALPGDARDTTTPASMAATLRKLLTSQRLSARSQRQLLQWMVDDRVAGPLIRSVLPAGWFIADRTGAGERGARGIVALLGPNNKAERIVVIYLRDTPASMAERNQQIAGIGAALIEHWQR
3JUC , Knot 77 153 0.84 40 117 149
MALVFVYGTLKRGQPNHRVLRDGAHGSAAFRARGRTLEPYPLVIAGEHNIPWLLHLPGSGRLVEGEVYAVDERMLRFLDDFESCPALYQRTVLRVQLLEDRAPGAEEPPAPTAVQCFVYSRATFPPEWAQLPHHDSYDSEGPHGLRYNPRENR
5UOC , Knot 186 440 0.85 40 246 410
APASLLPPAPEHSPPSSPLTQPPEGPKFPRVKNWEVGSITYDTLSAQAQQDGPCTPRRCLGSLVFPRKLQGRPSPGPPAPEQLLSQARDFINQYYSSIKRSGSQAHEQRLQEVEAEVAATGTYQLRESELVFGAKQAWRNAPRCVGRIQWGKLQVFDARDCRSAQEMFTYICNHIKYATNRGNLRSAITVFPQRCPGRGDFRIWNSQLVRYAGYRQQDGSVRGDPANVEITELCIQHGWTPGNGRFDVLPLLLQAPDEPPELFLLPPELVLEVPLEHPTLEWFAALGLRWYALPAVSNMLLEIGGLEFPAAPFSGWYMSTEIGTRNLCDPHRYNILEDVAVCMDLDTRTTSSLWKDKAAVEINVAVLHSYQLAKVTIVDHHAATASFMKHLENEQKARGGCPADWAWIVPPISGSLTPVFHQEMVNYFLSPAFRYQPDPW

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4FCF_1)}(2) \setminus P_{f(3JUC_1)}(2)|=99\), \(|P_{f(3JUC_1)}(2) \setminus P_{f(4FCF_1)}(2)|=61\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:0101100101000010101111010110100101101000111100101110111110101100010001000000110001100001101101101011110100001101111011111110111001100100100100010011110100000110111010011000010100000110111000111111001111111110001110011011111111000100111101000110110000011111111100100
Pair \(Z_2\) Length of longest common subsequence
4FCF_1,3JUC_1 160 4
4FCF_1,5UOC_1 175 5
3JUC_1,5UOC_1 193 4

Newick tree

 
[
	5UOC_1:95.80,
	[
		4FCF_1:80,3JUC_1:80
	]:15.80
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{418 }{\log_{20} 418}-\frac{153}{\log_{20}153})=79.1\)
Status Protein1 Protein2 d d1/2
Query variables 4FCF_1 3JUC_1 101 79
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]