CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5MXF_1 1OFD_1 5JWE_1 Letter Amino acid
11 100 28 E Glutamic acid
11 36 8 H Histidine
24 159 26 L Leucine
6 72 11 K Lycine
15 71 15 P Proline
36 139 20 A Alanine
1 20 4 C Cysteine
12 64 14 Q Glutamine
32 74 17 T Threonine
27 111 13 V Valine
15 71 21 R Arginine
43 81 13 S Serine
3 49 16 Y Tyrosine
7 42 7 F Phenylalanine
26 15 11 W Tryptophan
25 73 9 N Asparagine
30 145 22 G Glycine
2 36 5 M Methionine
21 79 12 D Aspartic acid
22 83 4 I Isoleucine

5MXF_1|Chain A|Photorhabdus asymbiotica lectin PHL|Photorhabdus asymbiotica subsp. asymbiotica (strain ATCC 43949 / 3105-77) (553480)
>1OFD_1|Chains A, B|FERREDOXIN-DEPENDENT GLUTAMATE SYNTHASE 2|SYNECHOCYSTIS SP. (1148)
>5JWE_1|Chains A, C, E, G|H-2 class I histocompatibility antigen, D-B alpha chain|Mus musculus (10090)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5MXF , Knot 109 369 0.58 40 141 220
MQPINTSNPDNTASYVKDEVEITSSTIALSEIVSVVNTSDGRLEVFGVGTDKAVWHNRQMAPHTGSPWSGWSSLKGQVTSKPVVYINTDGRLEVFARGTDNALWHIWQTATNAGWSNWQSLGGVITSNPAIYANTDGRLEVFARGADNALWHISQTTAHSGPWSSWASLNGVITSNPTVHINSDGRLEVFARGTDNALWHIWQTAPDSNLWSSWESLNGIITSDPVVIDTADGRLEVFARGADNALWHIWQTISHSGPWSGWQSLNGVITSAPAVAKNCDNRLEAFARGTDNALWHTWQTVSHSGPWSSWQSLNGVITSAPTAVRDADGRLEVFARGTDNALWLTWQTASSWSPWISLGGVLIDASAIK
1OFD , Knot 535 1520 0.86 40 345 1287
CGVGFIANLRGKPDHTLVEQALKALGCMEHRGGCSADNDSGDGAGVMTAIPRELLAQWFNTRNLPMPDGDRLGVGMVFLPQEPSAREVARAYVEEVVRLEKLTVLGWREVPVNSDVLGIQAKNNQPHIEQILVTCPEGCAGDELDRRLYIARSIIGKKLAEDFYVCSFSCRTIVYKGMVRSIILGEFYLDLKNPGYTSNFAVYHRRFSTNTMPKWPLAQPMRLLGHNGEINTLLGNINWMAAREKELEVSGWTKAELEALTPIVNQANSDSYNLDSALELLVRTGRSPLEAAMILVPEAYKNQPALKDYPEISDFHDYYSGLQEPWDGPALLVFSDGKIVGAGLDRNGLRPARYCITKDDYIVLGSEAGVVDLPEVDIVEKGRLAPGQMIAVDLAEQKILKNYQIKQQAAQKYPYGEWIKIQRQTVASDSFAEKTLFNDAQTVLQQQAAFGYTAEDVEMVVVPMASQGKEPTFCMGDDTPLAVLSHKPRLLYDYFKQRFAQVTNPPIDPLRENLVMSLAMFLGKRGNLLEPKAESARTIKLRSPLVNEVELQAIKTGQLQVAEVSTLYDLDGVNSLEDALTNLVKTAIATVQAGAEILVLTDRPNGAILTENQSFIPPLLAVGAVHHHLIRAGLRLKASLIVDTAQCWSTHHFACLVGYGASAICPYLALESVRQWWLDEKTQKLMENGRLDRIDLPTALKNYRQSVEAGLFKILSKMGISLLASYHGAQIFEAIGLGAELVEYAFAGTTSRVGGLTIADVAGEVMVFHGMAFPEMAKKLENFGFVNYRPGGEYHMNSPEMSKSLHKAVAAYKVGGNGNNGEAYDHYELYRQYLKDRPVTALRDLLDFNADQPAISLEEVESVESIVKRFCTGGMSLGALSREAHETLAIAMNRLGAKSNSGEGGEDVVRYLTLDDVDSEGNSPTLPHLHGLQNGDTANSAIKQIASGRFGVTPEYLMSGKQLEIKMAQGAKPGEGGQLPGKKVSEYIAMLRRSKPGVTLISPPPHHDIYSIEDLAQLIYDLHQINPEAQVSVKLVAEIGIGTIAAGVAKANADIIQISGHDGGTGASPLSSIKHAGSPWELGVTEVHRVLMENQLRDRVLLRADGGLKTGWDVVMAALMGAEEYGFGSIAMIAEGCIMARVCHTNNCPVGVATQQERLRQRFKGVPGQVVNFFYFIAEEVRSLLAHLGYRSLDDIIGRTDLLKVRSDVQLSKTQNLTLDCLLNLPDTKQNRQWLNHEPVHSNGPVLDDDILADPDIQEAINHQTTATKTYRLVNTDRTVGTRLSGAIAKKYGNNGFEGNITLNFQGAAGQSFGAFNLDGMTLHLQGEANDYVGKGMNGGEIVIVPHPQASFAPEDNVIIGNTCLYGATGGNLYANGRAGERFAVRNSVGKAVIEGAGDHCCEYMTGGVIVVLGPVGRNVGAGMTGGLAYFLDEVGDLPEKINPEIITLQRITASKGEEQLKSLITAHVEHTGSPKGKAILANWSDYLGKFWQAVPPSEKDSPEANNDVSLTGEKTLTSV
5JWE , Knot 126 276 0.85 40 186 270
GPHSMRYFETAVSRPGLEEPRYISVGYVDNKEFVRFDSDAENPRYEPRAPWMEQEGPEYWERETQKAKGQEQWFRVSLRNLLGYYNQSAGGSHTLQQMSGCDLGSDWRLLRGYLQFAYEGRDYIALNEDLKTWTAADMAAQITRRKWEQSGAAEHYKAYLEGECVEWLHRYLKNGNATLLRTDSPKAHVTHHPRSKGEVTLRCWALGFYPADITLTWQLNGEELTQDMELVETRPAGDGTFQKWASVVVPLGKEQNYTCRVYHEGLPEPLTLRWEP

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5MXF_1)}(2) \setminus P_{f(1OFD_1)}(2)|=10\), \(|P_{f(1OFD_1)}(2) \setminus P_{f(5MXF_1)}(2)|=214\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101100001000100100010100001110011011000010101111100011100001110010110110010101000111010001010111010001110110010011100100111110001110100010101110110011101000010011100110101110001010100010101110100011101100110001100100101110001111001010101110110011101100100011101100101110011111000000101110100011100100100011100100101110011011001010101110100011110100100101110111111010110
Pair \(Z_2\) Length of longest common subsequence
5MXF_1,1OFD_1 224 5
5MXF_1,5JWE_1 181 4
1OFD_1,5JWE_1 193 4

Newick tree

 
[
	1OFD_1:10.81,
	[
		5MXF_1:90.5,5JWE_1:90.5
	]:18.31
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1889 }{\log_{20} 1889}-\frac{369}{\log_{20}369})=382.\)
Status Protein1 Protein2 d d1/2
Query variables 5MXF_1 1OFD_1 503 291
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]