CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9IKJ_1 8VJN_1 1WDS_1 Letter Amino acid
16 7 48 L Leucine
4 4 11 M Methionine
23 4 25 S Serine
11 1 24 Y Tyrosine
7 1 27 I Isoleucine
3 2 29 K Lycine
13 2 30 P Proline
22 10 33 A Alanine
15 2 30 N Asparagine
2 0 6 C Cysteine
9 5 29 E Glutamic acid
20 2 39 G Glycine
19 1 17 T Threonine
9 6 19 R Arginine
21 2 32 V Valine
19 5 33 D Aspartic acid
9 2 21 Q Glutamine
0 6 11 H Histidine
12 3 20 F Phenylalanine
2 1 11 W Tryptophan

9IKJ_1|Chains A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P|TLP-1|algae metagenome (1300146)
>8VJN_1|Chains A, B, C, D|Encapsulin nanocompartment cargo protein EncD|Myxococcus xanthus (34)
>1WDS_1|Chain A|Beta-amylase|Glycine max (3847)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9IKJ , Knot 107 236 0.82 38 154 229
NLISEQNVTVTMDLQPVLQLGMQGSETVSFVFSQISEYIGGLTQYGAVDLSVSSTVDWCLYAAAFSSDAADAELNWTNMVTFGDSNPNSITNLPITVLQLFQSKPNPDTNSTRDSPSFKTAFDTGRAALGENNVYASRDPFDRPSADARYIAGGNAPAEVAGGSYLVDDGASGSNGAFYFTISFRVVPALPGTYPRATSEDQGNTDETDDLVVRGDGRYAYPGVYTLNVKFVMVEC
8VJN , Knot 37 66 0.78 38 53 60
MHHHHHHMAKNSNPSAFDRDFGYLMPFLDRVAAAASDLEDASARAELTRLMVEEKARWQRIQELLG
1WDS , Knot 204 495 0.85 40 255 476
ATSDSNMLLNYVPVYVMLPLGVVNVDNVFEDPDGLKEQLLQLRAAGVDGVMVDVWWGIIELKGPKQYDWRAYRSLLQLVQECGLTLQAIMSFHQCGGNVGDIVNIPIPQWVLDIGESNHDIFYTNRSGTRNKEYLTVGVDNEPIFHGRTAIEIYSDYMKSFRENMSDFLESGLIIDIEVGLGPAGELRYPSYPQSQGWEFPGIGEFQCYDKYLKADFKAAVARAGHPEWELPDDAGKYNDVPESTGFFKSNGTYVTEKGKFFLTWYSNKLLNHGDQILDEANKAFLGCKVKLAIKVSGIHWWYKVENHAAELTAGYYNLNDRDGYRPIARMLSRHHAILNFACLEMRDSEQPSDAKSGPQELVQQVLSGGWREDIRVAGENALPRYDATAYNQIILNARPQGVNNNGPPKLSMFGVTYLRLSDDLLQKSNFNIFKKFVLKMHADQDYCANPQKYNHAITPLKPSAPKIPIEVLLEATKPTLPFPWLPETDMKVDG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9IKJ_1)}(2) \setminus P_{f(8VJN_1)}(2)|=125\), \(|P_{f(8VJN_1)}(2) \setminus P_{f(9IKJ_1)}(2)|=24\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:01100001010101011101110100010111001000111100011101010001010101111000110101010011011000100100111011011000101000000001010011001011110001010001100101010011110111011110011001101001110101010111111100101000001000000011101010010111001010111100
Pair \(Z_2\) Length of longest common subsequence
9IKJ_1,8VJN_1 149 3
9IKJ_1,1WDS_1 173 3
8VJN_1,1WDS_1 230 4

Newick tree

 
[
	1WDS_1:10.33,
	[
		9IKJ_1:74.5,8VJN_1:74.5
	]:34.83
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{302 }{\log_{20} 302}-\frac{66}{\log_{20}66})=75.6\)
Status Protein1 Protein2 d d1/2
Query variables 9IKJ_1 8VJN_1 92 57.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]