CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1PKX_1 5FPI_1 6LPS_1 Letter Amino acid
30 38 18 I Isoleucine
25 17 14 P Proline
28 27 19 R Arginine
33 18 31 D Aspartic acid
13 7 10 H Histidine
37 39 16 K Lycine
32 19 16 T Threonine
5 5 11 W Tryptophan
52 37 24 V Valine
20 21 17 Q Glutamine
37 31 34 E Glutamic acid
19 21 14 F Phenylalanine
9 6 0 C Cysteine
57 31 25 L Leucine
38 26 20 G Glycine
10 14 5 M Methionine
39 33 18 S Serine
18 14 18 Y Tyrosine
66 22 22 A Alanine
24 20 19 N Asparagine

1PKX_1|Chains A, B, C, D|Bifunctional purine biosynthesis protein PURH|Homo sapiens (9606)
>5FPI_1|Chain A|AP-2 COMPLEX SUBUNIT MU|RATTUS NORVEGICUS (10116)
>6LPS_1|Chain A|Beta-xylanase|Bacillus halodurans (86665)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1PKX , Knot 235 592 0.84 40 263 555
MAPGQLALFSVSDKTGLVEFARNLTALGLNLVASGGTAKALRDAGLAVRDVSELTGFPEMLGGRVKTLHPAVHAGILARNIPEDNADMARLDFNLIRVVACNLYPFVKTVASPGVTVEEAVEQIDIGGVTLLRAAAKNHARVTVVCEPEDYVVVSTEMQSSESKDTSLETRRQLALKAFTHTAQYDEAISDYFRKQYSKGVSQMPLRYGMNPHQTPAQLYTLQPKLPITVLNGAPGFINLCDALNAWQLVKELKEALGIPAAASFKHVSPAGAAVGIPLSEDEAKVCMVYDLYKTLTPISAAYARARGADRMSSFGDFVALSDVCDVPTAKIISREVSDGIIAPGYEEEALTILSKKKNGNYCVLQMDQSYKPDENEVRTLFGLHLSQKRNNGVVDKSLFSNVVTKNKDLPESALRDLIVATIAVKYTQSNSVCYAKNGQVIGIGAGQQSRIHCTRLAGDKANYWWLRHHPQVLSMKFKTGVKRAEISNAIDQYVTGTIGEDEDLIKWKALFEEVPELLTEAEKKEWVEKLTEVSISSDAFFPFRDNVDRAKRSGVAYIAAPSGSAADKVVIEACDELGIILAHTNLRLFHH
5FPI , Knot 188 446 0.85 40 236 428
MIGGLFIYNHKGEVLISRVYRDDIGRNAVDAFRVNVIHARQQVRSPVTNIARTSFFHVKRSNIWLAAVTKQNVNAAMVFEFLYKMCDVMAAYFGKISEENIKNNFVLIYELLDEILDFGYPQNSETGALKTFITQQGIKSQHQTKEEQSQITSQVTGQIGWRREGIKYRRNELFLDVLESVNLLMSPQGQVLSAHVSGRVVMKSYLSGMPECKFGMNDKIVIEKQGKGTADETSKSMEQKLISEEDLGKQSIAIDDCTFHQCVRLSKFDSERSISFIPPDGEFELMRYRTTKDIILPFRVIPLVREVGRTKLEVKVVIKSNFKPSLLAQKIEVRIPTPLNTSGVQVICMKGKAKYKASENAIVWKIKRMAGMKESQISAEIELLPTNDKKKWARPPISMNFEVPFAPSGLKVRYLKVFEPKLNYSDHDVIKWVRYIGRSGIYETRC
6LPS , Knot 153 351 0.85 38 217 343
NDQPFAWQVASLSERYQEQFDIGAAVEPYQLEGRQAQILKHHYNSLVAENAMKPVSLQPREGEWNWEGADKIVEFARKHNMELRFHTLVWHSQVPEWFFIDENGNRMVDETDPEKRKANKQLLLERMENHIKTVVERYKDDVTSWDVVNEVIDDDGGLRESEWYQITGTDYIKVAFETARKYGGEEAKLYINDYNTENPSKRDDLYNLVKDLLEQGVPIDGVGHQSHISIGRPSIEDTRASFEKFTSLGLDNQVTELDMSLYGWPPTGAYTSYDDIPEELFQAQADRYDQLFELYEELSATISSVTFWGIADNHTWLDDRAREYNNGVGVDAPFVFDHNYRVKPAYWRIID

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1PKX_1)}(2) \setminus P_{f(5FPI_1)}(2)|=74\), \(|P_{f(5FPI_1)}(2) \setminus P_{f(1PKX_1)}(2)|=47\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1111011110100001110110010111101110110101100111110010010111011110100101110111110011000101101010110111001011100110111010011001011110110111000101011001000111000100000000010000011101100010000110001000000110011100110100011010010101110110111111010011011011001001111111101001011111111110000101011001000101101101010110010011011110010011010110001001111110000110110000010001101000001000010011110100000011100011001100000110011001111011100000001001001011111110000100001110010011100010110101001100101001100010101100001101011100110110010000110010010100011111000100100011101111010110011101000111111000101100
Pair \(Z_2\) Length of longest common subsequence
1PKX_1,5FPI_1 121 4
1PKX_1,6LPS_1 152 5
5FPI_1,6LPS_1 161 4

Newick tree

 
[
	6LPS_1:83.37,
	[
		1PKX_1:60.5,5FPI_1:60.5
	]:22.87
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1038 }{\log_{20} 1038}-\frac{446}{\log_{20}446})=155.\)
Status Protein1 Protein2 d d1/2
Query variables 1PKX_1 5FPI_1 197 170
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]