CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8JNC_1 5APX_1 3VDU_1 Letter Amino acid
13 2 19 I Isoleucine
44 13 16 L Leucine
14 5 5 M Methionine
19 3 9 T Threonine
36 3 10 R Arginine
4 0 5 C Cysteine
14 9 11 H Histidine
17 3 15 S Serine
4 0 0 W Tryptophan
39 4 12 A Alanine
6 12 15 K Lycine
32 2 10 P Proline
7 4 4 N Asparagine
27 2 15 G Glycine
29 7 21 V Valine
18 1 4 F Phenylalanine
9 4 7 Y Tyrosine
25 6 12 D Aspartic acid
18 3 3 Q Glutamine
33 12 19 E Glutamic acid

8JNC_1|Chains A, B|Cytochrome P450|Streptomyces sp. ZJ306 (1469403)
>5APX_1|Chains A, B, C|GENERAL CONTROL PROTEIN GCN4|SACCHAROMYCES CEREVISIAE (4932)
>3VDU_1|Chain A|Recombination protein recR|Thermoanaerobacter tengcongensis (273068)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8JNC , Knot 169 408 0.83 40 220 389
MPGQQEQQAPSEHPEQQELLTFPFASTGLEFPPVYHELYQQRLTKVRLPYGDDAYLAIRYADVKTVLSDSRFSIVASLGQDQPRTRAGARVGNGLFSLDPPQHSRLRSVLGRDFTPRRVEKLRERVRELTDQCLDRMEAAGSPADLVAHLAVPMPTAVVCEMMGVPEPDHHLFWGWAETILSNDTTPDDLIRRYQEFTAYMGAMVEERRARPTDDMFGMLVRACDEEGRITEIEMHALASDLLSAGFVSTAHQIANFTAMLLARPERLQPLVDKPEQIPAAVEELMRHVPILSGFSFPRYATEDLEMSGVTVRRGEAVIPVIAAANRDPDVYPDAGRLDLERNGLPHLGFGQGPHFCIGAHLARVELQVVLEALTERFPDLRFGIPENALKWKRGHFMNGLHELPVAW
5APX , Knot 38 95 0.60 36 58 64
MKHHHHHHPMSDYDIPTTENLYFQGHMKQLEDKVEELLSKVYHLENEVARLKKLMATKDDIANMKQLEDKVEELLSKVYHLENEVARLKKLVGER
3VDU , Knot 95 212 0.80 38 141 201
GSSHHHHHHSQDPMSYYSTSVAKLIEELSKLPGIGPGTAQRLAFFIINMPLDEVRSLSQAIIEAKEKLRYCKICFNITDKEVCDICSDENRDHSTICVVSHPMDVVAMEKVKEYKGVYHVLHGVISPIEGVGPEDIRIKELLERVRDGSVKEVILATNPDIEGEATAMYIAKLLKPFGVKVTRIAHGIPVGGDLEYTDVVTLSKALEGRREV

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8JNC_1)}(2) \setminus P_{f(5APX_1)}(2)|=180\), \(|P_{f(5APX_1)}(2) \setminus P_{f(8JNC_1)}(2)|=18\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111000001100010000110111100110111100010000100101101001011100101001100001011101100010001110110111010110000100111001010010010001001000010010111011011101111110111001111101000111111001100000100110000010101111100001010001111110100001010010101110011011110010011010111110100101110010011111001100111101101100100010101101001011111111100010101011010100011101111011010111011010101110110001101011110011010010110110011111
Pair \(Z_2\) Length of longest common subsequence
8JNC_1,5APX_1 198 4
8JNC_1,3VDU_1 169 4
5APX_1,3VDU_1 139 6

Newick tree

 
[
	8JNC_1:98.40,
	[
		3VDU_1:69.5,5APX_1:69.5
	]:28.90
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{503 }{\log_{20} 503}-\frac{95}{\log_{20}95})=122.\)
Status Protein1 Protein2 d d1/2
Query variables 8JNC_1 5APX_1 159 91.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]