CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6UNG_1 1XVR_1 2QKT_1 Letter Amino acid
18 0 2 M Methionine
30 0 5 S Serine
18 0 4 N Asparagine
7 2 2 C Cysteine
15 0 4 Q Glutamine
30 0 6 E Glutamic acid
26 2 9 G Glycine
12 0 0 H Histidine
26 1 3 T Threonine
3 0 0 W Tryptophan
29 0 6 I Isoleucine
32 0 4 F Phenylalanine
34 0 5 P Proline
16 0 2 Y Tyrosine
37 0 4 V Valine
25 0 5 D Aspartic acid
51 0 10 L Leucine
38 0 9 K Lycine
18 1 8 A Alanine
22 0 2 R Arginine

6UNG_1|Chain A|Cytochrome P450 3A4|Homo sapiens (9606)
>1XVR_1|Chains A, B[auth C]|5'-D(*CP*GP*TP*AP*CP*G)-3'|
>2QKT_1|Chains A, B|Inactivation-no-after-potential D protein|Drosophila melanogaster (7227)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6UNG , Knot 203 487 0.86 40 248 467
MAYLYGTHSHGLFKKLGIPGPTPLPFLGNILSYHKGFCMFDMECHKKYGKVWGFYDGQQPVLAITDPDMIKTVLVKECYSVFTNRRPFGPVGFMKSAISIAEDEEWKRLRSLLSPTFTSGKLKEMVPIIAQYGDVLVRNLRREAETGKPVTLKDVFGAYSMDVITSTSFGVNIDSLNNPQDPFVENTKKLLRFDFLDPFFLSITVFPFLIPILEVLNICVFPREVTNFLRKSVKRMKESRLEDTQKHRVDFLQLMIDSQNSKETESHKALSDLELVAQSIIFIFAGYETTSSVLSFIMYELATHPDVQQKLQEEIDAVLPNKAPPTYDTVLQMEYLDMVVNETLRLFPIAMRLERVCKKDVEINGMFIPKGVVVMIPSYALHRDPKYWTEPEKFLPERFSKKNKDNIDPYIYTPFGSGPRNCIGMRFALMNMKLALIRVLQNFSFKPCKETQIPLKLSLGGLLQPEKPVVLKVESRDGTVSGAHHHH
1XVR , Knot 5 6 0.49 8 4 4
CGTACG
2QKT , Knot 49 90 0.81 36 77 87
LEKFNVDLMKKAGKELGLSLSPNEIGCTIADLIQGQYPEIDSKLQRGDIITKFNGDALEGLPFQVCYALFKGANGKVSMEVTRPKPAAAS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6UNG_1)}(2) \setminus P_{f(1XVR_1)}(2)|=247\), \(|P_{f(1XVR_1)}(2) \setminus P_{f(6UNG_1)}(2)|=3\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1101010000111001111110111111011000011011010000001011110010011111001011001110000011000011111111001101100001001001101010010100111111001011100100010010110100111100101100001110100100100111000001101011011110101111111110110101110010011000100100001000000010110111000000000000110010111001111111000000110111001100101000100010111100111000011010010111000101111110100100001010111110111111100110001001001001110010000000101010011101100011101111010111101100101010000011101011111010011110100001010110000
Pair \(Z_2\) Length of longest common subsequence
6UNG_1,1XVR_1 250 2
6UNG_1,2QKT_1 211 3
1XVR_1,2QKT_1 81 1

Newick tree

 
[
	6UNG_1:13.49,
	[
		2QKT_1:40.5,1XVR_1:40.5
	]:90.99
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{493 }{\log_{20} 493}-\frac{6}{\log_{20}6})=155.\)
Status Protein1 Protein2 d d1/2
Query variables 6UNG_1 1XVR_1 199 100.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]