CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4RSH_1 5XKF_1 4ZQO_1 Letter Amino acid
16 32 36 L Leucine
13 19 14 K Lycine
3 10 17 M Methionine
7 20 5 F Phenylalanine
8 20 20 P Proline
9 27 25 D Aspartic acid
15 37 15 E Glutamic acid
11 27 12 I Isoleucine
2 13 7 H Histidine
9 29 22 T Threonine
15 34 49 V Valine
15 36 56 A Alanine
0 12 2 C Cysteine
7 16 10 Q Glutamine
12 23 28 S Serine
4 21 22 R Arginine
11 16 6 N Asparagine
12 36 51 G Glycine
2 4 1 W Tryptophan
8 19 9 Y Tyrosine

4RSH_1|Chains A, B, C|Lipolytic protein G-D-S-L family|Desulfitobacterium hafniense DCB-2 (272564)
>5XKF_1|Chains A, C|Tubulin alpha-1B chain|Sus scrofa (9823)
>4ZQO_1|Chain A|Inosine-5'-monophosphate dehydrogenase,Inosine-5'-monophosphate dehydrogenase|Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) (83332)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4RSH , Knot 87 179 0.84 38 130 173
SNANTKVVAIGDSFTFGYPGNTENSWPAVLGQTSQIEVVNKGLKSQTAQDLYSRFDADVLAEKPGRVIIFVGNGDAIKEVPLETFQQHIKAMVEKAESNHIIPILALPLPYTGVQNTIKEFREWESSYAKEKNILVLDFATVLMDADNVYLEGLLSKEANYPSKEGYKLMGEYASRVLD
5XKF , Knot 192 451 0.86 40 258 426
MRECISIHVGQAGVQIGNACWELYCLEHGIQPDGQMPSDKTIGGGDDSFNTFFSETGAGKHVPRAVFVDLEPTVIDEVRTGTYRQLFHPEQLITGKEDAANNYARGHYTIGKEIIDLVLDRIRKLADQCTGLQGFLVFHSFGGGTGSGFTSLLMERLSVDYGKKSKLEFSIYPAPQVSTAVVEPYNSILTTHTTLEHSDCAFMVDNEAIYDICRRNLDIERPTYTNLNRLISQIVSSITASLRFDGALNVDLTEFQTNLVPYPRIHFPLATYAPVISAEKAYHEQLSVAEITNACFEPANQMVKCDPRHGKYMACCLLYRGDVVPKDVNAAIATIKTKRSIQFVDWCPTGFKVGINYQPPTVVPGGDLAKVQRAVCMLSNTTAIAEAWARLDHKFDLMYAKRAFVHWYVGEGMEEGEFSEAREDMAALEKDYEEVGVDSVEGEGEEEGEEY
4ZQO , Knot 164 407 0.80 40 191 376
SNAMSRGMSGLEDSSDLVVSPYVRMGGLTTDPVPTGGDDPHKVAMLGLTFDDVLLLPAASDVVPATADTSSQLTKKIRLKVPLVSSAMDTVTESRMAIAMARAGGMGVLHRNLPVAEQAGQVEMVKRSGGLLVGAAVGVGGDAWVRAMMLVDAGVDVLVVDTAHAHNRLVLDMVGKLKSEVGDRVEVVGGNVATRSAAAALVDAGADAVKVGVGPGSICTTRVVAGVGAPQITAILEAVAACRPAGVPVIADGGLQYSGDIAKALAAGASTAMLGSLLAGTAEAPGELIFVNGKQYKSYRGMGSLGAMRGRGGATSYSKDRYFADDALSEDKLVPEGIEGRVPFRGPLSSVIHQLTGGLRAAMGYTGSPTIEVLQQAQFVRITPAGLKESHPHDVAMTVEAPNYYAR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4RSH_1)}(2) \setminus P_{f(5XKF_1)}(2)|=39\), \(|P_{f(5XKF_1)}(2) \setminus P_{f(4RSH_1)}(2)|=167\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00100011111001011011000001111110000101100110000100100010101110011011111101011001110010001011100100001111111111001100010010010000100001111011011101001010111000100100010011100100110
Pair \(Z_2\) Length of longest common subsequence
4RSH_1,5XKF_1 206 4
4RSH_1,4ZQO_1 147 4
5XKF_1,4ZQO_1 167 4

Newick tree

 
[
	5XKF_1:99.59,
	[
		4RSH_1:73.5,4ZQO_1:73.5
	]:26.09
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{630 }{\log_{20} 630}-\frac{179}{\log_{20}179})=128.\)
Status Protein1 Protein2 d d1/2
Query variables 4RSH_1 5XKF_1 167 115.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]