CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4IPZ_1 7XOH_1 4NKJ_1 Letter Amino acid
4 13 9 H Histidine
10 20 10 I Isoleucine
14 17 10 K Lycine
5 12 2 M Methionine
9 6 12 N Asparagine
8 35 13 S Serine
11 19 8 T Threonine
2 15 0 Y Tyrosine
6 29 2 P Proline
7 27 12 D Aspartic acid
12 32 14 E Glutamic acid
23 58 8 G Glycine
7 35 22 L Leucine
1 5 0 W Tryptophan
9 55 14 A Alanine
4 4 2 C Cysteine
3 6 4 Q Glutamine
15 8 4 F Phenylalanine
9 53 7 V Valine
6 29 4 R Arginine

4IPZ_1|Chain A|Peptidyl-prolyl cis-trans isomerase A|Homo sapiens (9606)
>7XOH_1|Chains A, B, C, D|Probable cystathionine beta-synthase Rv1077|Mycobacterium tuberculosis H37Rv (83332)
>4NKJ_1|Chain A|Hemagglutinin HA2|Influenza B virus (1354485)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4IPZ , Knot 82 165 0.84 40 129 161
MVNPTVFFDIAVDGEPLGRVSFELFADKVPKTAENFRALSTGEKGFGYKGSCFHRIIPGFMCQGGDFTRHNGTGGKSIYGEKFEDENFILKHTGPGILSMANAGPNTNGSQFFICTAKTEWLDGKHVVFGKVKEGMNIVEAMERFGSRNGKTSKKITIADCGQLE
7XOH , Knot 188 478 0.80 40 211 428
MARIAQHISELIGGTPLVRLNSVVPDGAGTVAAKVEYLNPGGSSKDRIAVKMIEAAEASGQLKPGGTIVEPTSGNTGVGLALVAQRRGYKCVFVCPDKVSEDKRNVLIAYGAEVVVCPTAVPPHDPASYYSVSDRLVRDIDGAWKPDQYANPEGPASHYVTTGPEIWADTEGKVTHFVAGIGTGGTITGAGRYLKEVSGGRVRIVGADPEGSVYSGGAGRPYLVEGVGEDFWPAAYDPSVPDEIIAVSDSDSFDMTRRLAREEAMLVGGSCGMAVVAALKVAEEAGPDALIVVLLPDGGRGYMSKIFNDAWMSSYGFLRSRLDGSTEQSTVGDVLRRKSGALPALVHTHPSETVRDAIGILREYGVSQMPVVGAEPPVMAGEVAGSVSERELLSAVFEGRAKLADAVSAHMSPPLRMIGAGELVSAAGKALRDWDALMVVEEGKPVGVITRYDLLGFLSEGAGRRKLAAALEHHHHHH
4NKJ , Knot 74 157 0.79 36 115 150
HHHHHHVGVAVAADLKSTQEAINKITKNLNSLSELEVKNLQRLSGAMDELHNEILELDEKVDDLRADTISSQIELAVLLSNEGIINSEDEHLLALERKLKKMLGPSAVDIGNGSFETKHKCNQTCLDRIAAGTFNAGEFSLPTFDSLNITAASLNDD

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4IPZ_1)}(2) \setminus P_{f(7XOH_1)}(2)|=46\), \(|P_{f(7XOH_1)}(2) \setminus P_{f(4IPZ_1)}(2)|=128\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110101110111010111010101110011001001011001001110010010011111100110100001011001010010000111000111110110111000100111001000110100111101001101101100110001000001011001010
Pair \(Z_2\) Length of longest common subsequence
4IPZ_1,7XOH_1 174 4
4IPZ_1,4NKJ_1 146 3
7XOH_1,4NKJ_1 164 6

Newick tree

 
[
	7XOH_1:88.04,
	[
		4IPZ_1:73,4NKJ_1:73
	]:15.04
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{643 }{\log_{20} 643}-\frac{165}{\log_{20}165})=136.\)
Status Protein1 Protein2 d d1/2
Query variables 4IPZ_1 7XOH_1 164 112
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]