CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4IZZ_1 2EQR_1 4GAJ_1 Letter Amino acid
24 1 10 A Alanine
5 4 8 N Asparagine
18 4 6 D Aspartic acid
32 4 17 L Leucine
18 7 9 K Lycine
18 2 14 P Proline
15 6 34 S Serine
12 3 19 V Valine
23 2 5 R Arginine
26 4 6 E Glutamic acid
10 2 2 H Histidine
15 3 7 I Isoleucine
16 1 3 M Methionine
14 2 30 T Threonine
11 4 15 Y Tyrosine
1 1 4 C Cysteine
15 2 8 Q Glutamine
13 4 13 G Glycine
11 4 3 F Phenylalanine
5 1 5 W Tryptophan

4IZZ_1|Chains A, B|Transcription Factor HetR|Fischerella thermalis (98439)
>2EQR_1|Chain A|Nuclear receptor corepressor 1|Homo sapiens (9606)
>4GAJ_1|Chain A[auth H]|NEUTRALIZING ANTIBODY AP33 HEAVY CHAIN|Mus musculus (10090)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4IZZ , Knot 133 302 0.83 40 189 291
SNAMSNDVDLIKRLGPSAMDQIMLYLAFSAMRTSGHRHGAFLDAAATAAKCAIYMTYLEQGQNLRMTGHLHHLEPKRVKAIVEEVRQALTEGKLLKMLGSQEPRYLIQFPYVWMEKYPWRPGRSRIPGTSLTSEEKRQIEQKLPSNLPDAHLITSFEFLELIEFLHKRSQEDLPKEHQMPLSEALAEHIKRRLLYSGTVTRIDSPWGMPFYALTRPFYAPADDQERTYIMVEDTARFFRMMRDWAEKRPNTMRVLEELDILPEKMQQAKDELDEIIRAWADKYHQDDGVPVVLQMVFGKKED
2EQR , Knot 37 61 0.83 40 54 57
GSSGSSGDRQFMNVWTDHEKEIFKDKFIQHPKNFGLIASYLERKSVPDCVLYYYLTKKNEN
4GAJ , Knot 98 218 0.80 40 143 201
EVQLQESGPSLVKPSQTLSLTCSVTGDSITSGYWNWIRKFPGNKLEYMGYISYSGSTYYNLSLRSRISITRDTSKNQYYLQLNSVTTEDTATYYCALITTTTYAMDYWGQGTSVTVSSAKTTPPSVYPLAPGSAAQTNSMVTLGCLVKGYFPEPVTVTWNSGSLSSGVHTFPAVLQSDLYTLSSSVTVPSSTWPSETVTCNVAHPASSTKVDKKIVPR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4IZZ_1)}(2) \setminus P_{f(2EQR_1)}(2)|=166\), \(|P_{f(2EQR_1)}(2) \setminus P_{f(4IZZ_1)}(2)|=31\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00110001011001110110011101110110001000111101110110011010010010010101010010100101110010011001011011100010011011011100011011000111001000000010001100110101100101101101100000001100001110011100100011001010010011111101100110111000000011100010110110011000100101100101110010010001001101110000000111111011110000
Pair \(Z_2\) Length of longest common subsequence
4IZZ_1,2EQR_1 197 3
4IZZ_1,4GAJ_1 182 4
2EQR_1,4GAJ_1 151 3

Newick tree

 
[
	4IZZ_1:10.44,
	[
		4GAJ_1:75.5,2EQR_1:75.5
	]:24.94
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{363 }{\log_{20} 363}-\frac{61}{\log_{20}61})=95.2\)
Status Protein1 Protein2 d d1/2
Query variables 4IZZ_1 2EQR_1 122 72.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]