CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9CNV_1 5FFV_1 5BNX_1 Letter Amino acid
13 6 1 N Asparagine
9 1 1 H Histidine
20 5 2 P Proline
5 0 0 W Tryptophan
12 8 4 D Aspartic acid
8 5 1 M Methionine
4 8 4 F Phenylalanine
16 5 3 V Valine
21 4 3 G Glycine
14 5 7 I Isoleucine
18 14 10 L Leucine
11 8 4 K Lycine
11 10 6 E Glutamic acid
8 4 3 S Serine
12 5 4 T Threonine
7 5 1 Y Tyrosine
15 7 10 A Alanine
11 8 9 R Arginine
5 1 1 C Cysteine
20 7 5 Q Glutamine

9CNV_1|Chain A|Capsid protein p24|Human immunodeficiency virus 2 (11709)
>5FFV_1|Chains A, B|Peregrin|Homo sapiens (9606)
>5BNX_1|Chain A|Histone H3.3|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9CNV , Knot 110 240 0.83 40 175 233
PVQQTGGGNYIHVPLSPRTLNAWVKLVEDKKFGAEVVPGFQALSEGCTPYDINQMLNCVGDHQAAMQIIREIINDEAADWDAQHPIPGPLPAGQLRDPRGSDIAGTTSTVEEQIQWMYRPQNPVPVGNIYRRWIQIGLQKCVRMYNPTNILDVKQGPKEPFQSYVDRFYKSLRAEQTDPAVKNWMTQTLLIQNANPDCKLVLKGLGMNPTLEEMLTACQGVGGPGQKARLMGSSHHHHHH
5FFV , Knot 62 116 0.84 38 95 112
SMEMQLTPFLILLRKTLEQLQEKDTGNIFSEPVPLSEVPDYLDHIKKPMDFFTMKQNLEAYRYLNFDDFEEDFNLIVSNCLKYNAKDTIFYRAAVRLREQGGAVLRQARRQAEKMG
5BNX , Knot 43 79 0.79 38 73 77
STELLIRKLPFQRLVREIAQDFKTDLRFQSAAIGALQEASEAYLVGLFEDTNLCAIHAKRVTIMPKDIQLARRIRGERA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9CNV_1)}(2) \setminus P_{f(5FFV_1)}(2)|=120\), \(|P_{f(5FFV_1)}(2) \setminus P_{f(9CNV_1)}(2)|=40\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110001110010111010010111011000011101111101100100100100110011000111011001100011010100111111111010010100111000010001011001001111101000110111000101001001101001100110001001000101000011100110001110010100011101111010100110100111111001011100000000
Pair \(Z_2\) Length of longest common subsequence
9CNV_1,5FFV_1 160 4
9CNV_1,5BNX_1 170 3
5FFV_1,5BNX_1 110 3

Newick tree

 
[
	9CNV_1:89.86,
	[
		5FFV_1:55,5BNX_1:55
	]:34.86
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{356 }{\log_{20} 356}-\frac{116}{\log_{20}116})=73.7\)
Status Protein1 Protein2 d d1/2
Query variables 9CNV_1 5FFV_1 96 69
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]