CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3ZIW_1 6OFG_1 1XRJ_1 Letter Amino acid
11 5 20 Q Glutamine
24 3 17 G Glycine
1 0 3 C Cysteine
15 6 16 K Lycine
9 3 13 F Phenylalanine
33 7 20 S Serine
4 1 0 W Tryptophan
21 5 10 N Asparagine
15 8 17 D Aspartic acid
3 1 5 H Histidine
25 8 23 L Leucine
2 1 2 M Methionine
18 4 8 Y Tyrosine
18 4 21 V Valine
16 12 14 A Alanine
14 2 16 E Glutamic acid
7 3 11 P Proline
20 5 14 T Threonine
7 2 15 R Arginine
23 3 16 I Isoleucine

3ZIW_1|Chains A, B, C, D, E, F|HEAT-LABILE ENTEROTOXIN B CHAIN|CLOSTRIDIUM PERFRINGENS (1502)
>6OFG_1|Chains A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R|Protein PrgI|Salmonella typhimurium (strain SL1344) (216597)
>1XRJ_1|Chains A, B|Uridine-cytidine kinase 2|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3ZIW , Knot 126 286 0.83 40 168 278
GAMGSDGLYVIDKGAGWILGEPSVVSSQILNPNETGTFSQSLTKSKEVSINVNFSVGFTSEFIQASVEYGFGITIGEQNTIERSVSTTAGPNEYVYYKVYATYRKYQAIRISHGNISDDGSIYKLTGIWLSKTSADSLGNIDQGSLIETGERCVLTVPSTDIEKEILDLAAATERLNLTDALNSNPAGNLYDWRSSNSYPWTQKLNLHLTITATGQKYRILASKIVDFNIYSNNFNNLVKLEQSLGDGVKDHYVDISLDAGQYVLVMKANSSYSGNYPYSILFQKF
6OFG , Knot 47 83 0.83 38 73 80
GSHMATPWSGYLDDVSAKFDTGVDNLQTQVTEALDKLAAKPSDPALLAAYQSKLSEYNLYRNAQSNTVKAFKDIDAAIIQNFR
1XRJ , Knot 116 261 0.82 38 168 248
MAGDSEQTLQNHQQPNGGEPFLIGVSGGTASGKSSVCAKIVQLLGQNEVDYRQKQVVILSQDSFYRVLTSEQKAKALKGQFNFDHPDAFDNELILKTLKEITEGKTVQIPVYDFVSHSRKEETVTVYPADVVLFEGILAFYSQEVRDLFQMKLFVDTDADTRLSRRVLRDISERGRDLEQILSQYITFVKPAFEEFCLPTKKYADVIIPRGADNLVAINLIVQHIQDILNGGPSKRQTNGCLNGYTPSRKRQASESSSRPH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3ZIW_1)}(2) \setminus P_{f(6OFG_1)}(2)|=123\), \(|P_{f(6OFG_1)}(2) \setminus P_{f(3ZIW_1)}(2)|=28\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1111001101100111111101011000110100010100010000010101010111000110101001111011000010001000111000100010100000011010010100010100101111000010011010010110010001101100010001101111000101001100011101001000000110001010101010100001110011010100001001101000110110000101010110011110100000100100111001
Pair \(Z_2\) Length of longest common subsequence
3ZIW_1,6OFG_1 151 3
3ZIW_1,1XRJ_1 152 3
6OFG_1,1XRJ_1 159 3

Newick tree

 
[
	1XRJ_1:78.51,
	[
		3ZIW_1:75.5,6OFG_1:75.5
	]:3.01
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{369 }{\log_{20} 369}-\frac{83}{\log_{20}83})=88.9\)
Status Protein1 Protein2 d d1/2
Query variables 3ZIW_1 6OFG_1 110 70.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]