CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7VWO_1 1YFY_1 2HXR_1 Letter Amino acid
7 15 11 P Proline
4 6 16 S Serine
7 5 3 N Asparagine
14 13 10 D Aspartic acid
18 16 34 L Leucine
17 15 15 R Arginine
2 5 13 I Isoleucine
4 4 2 W Tryptophan
5 10 10 H Histidine
0 4 8 K Lycine
5 9 6 F Phenylalanine
4 4 12 T Threonine
3 7 11 Q Glutamine
6 11 17 E Glutamic acid
6 13 10 G Glycine
3 6 5 Y Tyrosine
10 13 16 V Valine
23 11 32 A Alanine
1 5 3 C Cysteine
3 2 4 M Methionine

7VWO_1|Chains A, B, C[auth E], D[auth G], E[auth I], F[auth K]|Ribonuclease VapC43|Mycobacterium tuberculosis H37Rv (83332)
>1YFY_1|Chain A|3-hydroxyanthranilate-3,4-dioxygenase|Cupriavidus metallidurans (119219)
>2HXR_1|Chains A, B|HTH-type transcriptional regulator cynR|Escherichia coli (83333)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7VWO , Knot 69 142 0.80 38 104 137
MLCVDVNVLVYAHRADLREHADYRGLLERLANDDEPLGLPDSVLAGFIRVVTNRRVFTEPTSPQDAWQAVDALLAAPAAMRLRPGERHWMAFRQLASDVDANGNDIADAHLAAYALENNATWLSADRGFARFRRLRWRHPLD
1YFY , Knot 88 174 0.87 40 139 169
MLTYGAPFNFPRWIDEHAHLLKPPVGNRQVWQDSDFIVTVVGGPNHRTDYHDDPLEEFFYQLRGNAYLNLWVDGRRERADLKEGDIFLLPPHVRHSPQRPEAGSACLVIERQRPAGMLDGFEWYCDACGHLVHRVEVQLKSIVTDLPPLFESFYASEDKRRCPHCGQVHPGRAA
2HXR , Knot 106 238 0.81 40 149 231
GVWRQYASRALQELGAGKRAIHDVADLTRGSLRIAVTPTFTSYFIGPLMADFYARYPSITLQLQEMSQEKIEDMLCRDELDVGIAFAPVHSPELEAIPLLTESLALVVAQHHPLAVHEQVALSRLHDEKLVLLSAEFATREQIDHYCEKAGLHPQVVIEANSISAVLELIRRTSLSTLLPAAIATQHDGLKAISLAPPLLERTAVLLRRKNSWQTAAAKAFLHMALDKCAVVGGNESR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7VWO_1)}(2) \setminus P_{f(1YFY_1)}(2)|=49\), \(|P_{f(1YFY_1)}(2) \setminus P_{f(7VWO_1)}(2)|=84\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1101010111010010100010001110011000011111001111110110000110010010011011011111111101011000111100110010101001101011101100010110100111010010100110
Pair \(Z_2\) Length of longest common subsequence
7VWO_1,1YFY_1 133 4
7VWO_1,2HXR_1 141 4
1YFY_1,2HXR_1 160 4

Newick tree

 
[
	2HXR_1:78.14,
	[
		7VWO_1:66.5,1YFY_1:66.5
	]:11.64
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{316 }{\log_{20} 316}-\frac{142}{\log_{20}142})=53.4\)
Status Protein1 Protein2 d d1/2
Query variables 7VWO_1 1YFY_1 68 60.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]