CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4JWK_1 3QWH_1 6XTS_1 Letter Amino acid
11 12 10 H Histidine
12 22 14 S Serine
6 3 5 M Methionine
4 14 21 T Threonine
1 2 14 W Tryptophan
7 14 16 E Glutamic acid
25 27 30 G Glycine
17 16 7 L Leucine
15 12 7 I Isoleucine
12 27 38 A Alanine
9 12 22 R Arginine
9 7 4 Q Glutamine
10 16 5 K Lycine
6 12 13 F Phenylalanine
12 8 28 P Proline
5 6 11 Y Tyrosine
12 33 23 V Valine
12 11 11 N Asparagine
8 15 18 D Aspartic acid
0 1 6 C Cysteine

4JWK_1|Chain A|Peptidyl-tRNA hydrolase|Acinetobacter baumannii ATCC 19606 = CIP 70.34 (575584)
>3QWH_1|Chains A, B, C, D|17beta-hydroxysteroid dehydrogenase|Cochliobolus lunatus (5503)
>6XTS_1|Chain A|Formylglycine-generating enzyme|Thermomonospora curvata (strain ATCC 19995 / DSM 43183 / JCM 3096 / NBRC 15933 / NCIMB 10081 / Henssen B9) (471852)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4JWK , Knot 95 193 0.86 38 142 188
MSNISLIVGLGNPGSEYAQTRHNAGFWFVEQLADKYGITLKNDPKFHGISGRGNIEGHDVRLLLPMTYMNRSGQSVVPFSKFYQIAPEAILIAHDELDMNPGVIRLKTGGGHGGHNGLRDIVPHIGPNFHRLRIGIGHPGSKERVSGHVLGKAPSNEQSLMDGAIDHALSKVKLLVQGQVPQAMNQINAYKPA
3QWH , Knot 120 270 0.83 40 167 254
MPHVENASETYIPGRLDGKVALVTGSGRGIGAAVAVHLGRLGAKVVVNYANSTKDAEKVVSEIKALGSDAIAIKADIRQVPEIVKLFDQAVAHFGHLDIAVSNSGVVSFGHLKDVTEEEFDRVFSLNTRGQFFVAREAYRHLTEGGRIVLTSSNTSKDFSVPKHSLYSGSKGAVDSFVRIFSKDCGDKKITVNAVAPGGTVTDMFHEVSHHYIPNGTSYTAEQRQQMAAHASPLHRNGWPQDVANVVGFLVSKEGEWVNGKVLTLDGGAA
6XTS , Knot 128 303 0.80 40 174 291
HMPSFDFDIPRRSPQEIAKGMVAIPGGTFRMGGEDPDAFPEDGEGPVRTVRLSPFLIDRYAVSNRQFAAFVKATGYVTDAERYGWSFVFHAHVAPGTPVMDAVVPEAPWWVAVPGAYWKAPEGPGSSITDRPNHPVVHVSWNDAVAYATWAGKRLPTEAEWEMAARGGLDQARYPWGNELTPRGRHRCNIWQGTFPVHDTGEDGYTGTAPVNAFAPNGYGLYNVAGNVWEWCADWWSADWHATESPATRIDPRGPETGTARVTKGGSFLCHESYCNRYRVAARTCNTPDSSAAHTGFRCAADP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4JWK_1)}(2) \setminus P_{f(3QWH_1)}(2)|=65\), \(|P_{f(3QWH_1)}(2) \setminus P_{f(4JWK_1)}(2)|=90\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1001011111101100010000011111100110001101000101011010101010010111110010001001111001001110111110001010111101001110110011001110111010010111101100001010111011000001101110011001011101011011001010011
Pair \(Z_2\) Length of longest common subsequence
4JWK_1,3QWH_1 155 4
4JWK_1,6XTS_1 178 3
3QWH_1,6XTS_1 159 4

Newick tree

 
[
	6XTS_1:86.55,
	[
		4JWK_1:77.5,3QWH_1:77.5
	]:9.05
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{463 }{\log_{20} 463}-\frac{193}{\log_{20}193})=78.9\)
Status Protein1 Protein2 d d1/2
Query variables 4JWK_1 3QWH_1 99 85.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]