CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8XRA_1 1XTZ_1 4JLM_1 Letter Amino acid
10 1 7 W Tryptophan
14 11 12 H Histidine
22 21 13 I Isoleucine
43 21 28 L Leucine
19 10 11 P Proline
32 17 28 S Serine
17 11 12 F Phenylalanine
36 7 15 T Threonine
17 21 11 A Alanine
19 11 13 R Arginine
43 22 13 G Glycine
36 21 17 K Lycine
12 1 7 M Methionine
25 20 11 D Aspartic acid
12 9 13 Q Glutamine
40 12 13 N Asparagine
12 2 2 C Cysteine
41 17 29 E Glutamic acid
19 7 12 Y Tyrosine
28 22 13 V Valine

8XRA_1|Chains A, B, C|Hemagglutinin|Influenza A virus (11320)
>1XTZ_1|Chain A|Ribose-5-phosphate isomerase|Saccharomyces cerevisiae (4932)
>4JLM_1|Chains A, B|Deoxycytidine kinase|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8XRA , Knot 199 497 0.82 40 248 466
DQICIGYHANNSTETVDTILERNVTVTHAKDILEKTHNGKLCKLNGIPPLELGDCSIAGWLLGNPECDRLLSVPEWSYIMEKENPRDGLCYPGSFNDYEELKHLLSSVKHFEKVKILPKDRWTQHTTTGGSRACAVSGNPSFFRNMVWLTKKGSNYPVAKGSYNNTSGEQMLIIWGVHHPNDETEQRTLYQNVGTYVSVGTSTLNKRSTPDIATRPKVNGLGSRMEFSWTLLDMWDTINFESTGNLIAPEYGFKISKRGSSGIMKTEGTLENCETKCQTPLGAINTTLPFHNVHPLTIGECPKYVKSEKLVLATGLRNVPQIESRGLFGAIAGFIEGGWQGMVDGWYGYHHSNDQGSGYAADKESTQKAFDGITNKVNSVIEKMNTQFEAVGKEFSNLERRLENLNKKMEDGFLDVWTYNAELLVLMENERTLDFHDSNVKNLYDKVRMQLRDNVKELGNGCFEFYHKCDDECMNSVKNGTYDYPKYEEESKLNRNE
1XTZ , Knot 113 264 0.79 40 155 250
MAAGVPKIDALESLGNPLEDAKRAAAYRAVDENLKFDDHKIIGIGSGSTVVYVAERIGQYLHDPKFYEVASKFICIPTGFQSRNLILDNKLQLGSIEQYPRIDIAFDGADEVDENLQLIKGGGACLFQEKLVSTSAKTFIVVADSRKKSPKHLGKNWRQGVPIEIVPSSYVRVKNDLLEQLHAEKVDIRQGGSAKAGPVVTDNNNFIIDADFGEISDPRKLHREIKLLVGVVETGLFIDNASKAYFGNSDGSVEVTEKHHHHHH
4JLM , Knot 126 280 0.84 40 186 268
MGSSHHHHHHSSGLVPRGSHMATPPKRSSPSFSASSEGTRIKKISIEGNIAAGKSTFVNILKQLSEDWEVVPEPVARWSNVQSTQDEFEELTMEQKNGGNVLQMMYEKPERWSFTFQTYACLSRIRAQLASLNGKLKDAEKPVLFFERSVYSDRYIFASNLYESESMNETEWTIYQDWHDWMNNQFGQSLELDGIIYLQATPETCLHRIYLRGRNEEQGIPLEYLEKLHYKHESWLLHRTLKTNFDYLQEVPILTLDVNEDFKDKYESLVEKVKEFLSTL

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8XRA_1)}(2) \setminus P_{f(1XTZ_1)}(2)|=139\), \(|P_{f(1XTZ_1)}(2) \setminus P_{f(8XRA_1)}(2)|=46\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00101100100000010011000101001001100000101001011111011000111111101000011011010011000010011001101000001001100100100101110001000000110010110101011001111000100011101000000100111111100100000000100011001011000100000101100101011100101010110110010100010111100110100010011100010100000000011111000111001011011001001000011110110011010001111111111011101110110100000001010110000000110110001001100100010111001001000100100010011101100010111110000010100001001000101010001001101010100000000100100100001000000010000
Pair \(Z_2\) Length of longest common subsequence
8XRA_1,1XTZ_1 185 4
8XRA_1,4JLM_1 178 4
1XTZ_1,4JLM_1 157 6

Newick tree

 
[
	8XRA_1:94.50,
	[
		4JLM_1:78.5,1XTZ_1:78.5
	]:16.00
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{761 }{\log_{20} 761}-\frac{264}{\log_{20}264})=137.\)
Status Protein1 Protein2 d d1/2
Query variables 8XRA_1 1XTZ_1 178 133
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]