CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3RYS_1 2MNZ_1 4PVS_1 Letter Amino acid
10 1 11 N Asparagine
4 7 6 C Cysteine
11 1 7 Q Glutamine
12 1 5 F Phenylalanine
35 2 18 E Glutamic acid
18 4 41 G Glycine
9 2 8 H Histidine
39 9 22 L Leucine
1 1 3 W Tryptophan
28 3 30 V Valine
43 2 34 A Alanine
14 1 14 I Isoleucine
7 2 23 K Lycine
9 0 9 M Methionine
16 4 11 P Proline
18 2 10 R Arginine
23 8 20 D Aspartic acid
18 2 15 S Serine
19 1 19 T Threonine
9 2 4 Y Tyrosine

3RYS_1|Chains A, B|Adenosine deaminase 1|Arthrobacter aurescens (290340)
>2MNZ_1|Chain A|Lysine-specific demethylase 5B|Homo sapiens (9606)
>4PVS_1|Chains A, B|Isoaspartyl peptidase/L-asparaginase|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3RYS , Knot 144 343 0.81 40 188 331
METFGEKTTSTAPPVAELHLHIEGTLQPELIFALAERNGIELPYEDIEELREKYEFTDLQSFLDLYYANMAVLQTEQDFTDMTRAYLERAAAGGVRHAEIMMDPQAHTSRGVALETCVNGVANALATSEEDFGVSTLLIAAFLRDMSEDSALEVLDQLLAMHAPIAGIGLDSAEVGNPPSKFERLYQRAAEAGLRRIAHAGEEGPASYITEALDVLHVERIDHGIRCMEDTDVVQRLVAEQVPLTVCPLSNVRLRAVDKLADHPLPEMLAIGLNVCVNSDDPAYFGGYVDDNFEQLVKVLEFSVPEQATLAANSIRSSFASDARKAVLLDEVTEWVKASVTPA
2MNZ , Knot 33 55 0.80 38 47 52
AVDLYVCLLCGSGNDEDRLLLCDGCDDSYHTFCLIPPLHDVPKGDWRCPKCLAQE
4PVS , Knot 133 310 0.82 40 183 292
GHMNPIVVVHGGGAGPISKDRKERVHQGMVRAATVGYGILREGGSAVDAVEGAVVALEDDPEFNAGCGSVLNTNGEVEMDASIMDGKDLSAGAVSAVQCIANPIKLARLVMEKTPHCFLTDQGAAQFAAAMGVPEIPGEKLVTERNKKRLEKEKHEKGAQKTDCQKNLGTVGAVALDCKGNVAYATSTGGIVNKMVGRVGDSPCLGAGGYADNDIGAVSTTGHGESILKVNLARLTLFHIEQGKTVEEAADLSLGYMKSRVKGLGGLIVVSKTGDWVAKWTSTSMPWAAAKDGKLHFGIDPDDTTITDLP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3RYS_1)}(2) \setminus P_{f(2MNZ_1)}(2)|=163\), \(|P_{f(2MNZ_1)}(2) \setminus P_{f(3RYS_1)}(2)|=22\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1001100000011111010101010101011111100011011000100100000100100110100101111000001001001010011111100101110101000011110001011101110000011100111111100100001101100111101111111100101101100100100011011100110110011100100110110100100110010000110011100111010110010101100110011101111110101000011011101000100110110101100101110010001100100111100100110101011
Pair \(Z_2\) Length of longest common subsequence
3RYS_1,2MNZ_1 185 3
3RYS_1,4PVS_1 159 4
2MNZ_1,4PVS_1 182 3

Newick tree

 
[
	2MNZ_1:95.48,
	[
		3RYS_1:79.5,4PVS_1:79.5
	]:15.98
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{398 }{\log_{20} 398}-\frac{55}{\log_{20}55})=107.\)
Status Protein1 Protein2 d d1/2
Query variables 3RYS_1 2MNZ_1 134 77.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]