CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
5PJH_1 5FAX_1 3JTX_1 Letter Amino acid
20 34 13 T Threonine
26 7 23 E Glutamic acid
27 48 27 G Glycine
24 29 44 L Leucine
23 15 20 K Lycine
18 23 27 P Proline
18 34 24 S Serine
19 14 14 Y Tyrosine
25 51 36 A Alanine
20 38 15 N Asparagine
18 8 7 H Histidine
20 17 21 I Isoleucine
11 7 5 M Methionine
19 15 18 F Phenylalanine
18 16 21 R Arginine
15 15 17 Q Glutamine
14 24 24 D Aspartic acid
7 0 6 C Cysteine
8 5 7 W Tryptophan
14 33 27 V Valine

5PJH_1|Chain A|Lysine-specific demethylase 4D|Homo sapiens (9606)
>5FAX_1|Chains A, B|Subtilase SubHal from Bacillus halmapalus|Bacillus halmapalus (79882)
>3JTX_1|Chains A, B|aminotransferase|Neisseria meningitidis Z2491 (122587)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
5PJH , Knot 160 364 0.86 40 229 347
MHHHHHHSSGVDLGTENLYFQSMETMKSKANCAQNPNCNIMIFHPTKEEFNDFDKYIAYMESQGAHRAGLAKIIPPKEWKARETYDNISEILIATPLQQVASGRAGVFTQYHKKKKAMTVGEYRHLANSKKYQTPPHQNFEDLERKYWKNRIYNSPIYGADISGSLFDENTKQWNLGHLGTIQDLLEKECGVVIEGVNTPYLYFGMWKTTFAWHTEDMDLYSINYLHLGEPKTWYVVPPEHGQRLERLARELFPGSSRGCGAFLRHKVALISPTVLKENGIPFNRITQEAGEFMVTFPYGYHAGFNHGFNCAEAINFATPRWIDYGKMASQCSCGEARVTFSMDAFVRILQPERYDLWKRGQDR
5FAX , Knot 170 433 0.79 38 209 399
NDVARGIVKADVAQNNFGLYGQGQIVAVADTGLDTGRNDSSMHEAFRGKITALYALGRTNNANDPNGHGTHVAGSVLGNATNKGMAPQANLVFQSIMDSGGGLGGLPANLQTLFSQAYSAGARIHTNSWGAPVNGAYTTDSRNVDDYVRKNDMTILFAAGNEGPGSGTISAPGTAKNAITVGATENLRPSFGSYADNINHVAQFSSRGPTRDGRIKPDVMAPGTYILSARSSLAPDSSFWANHDSKYAYMGGTSMATPIVAGNVAQLREHFVKNRGVTPKPSLLKAALIAGAADVGLGFPNGNQGWGRVTLDKSLNVAFVNETSPLSTSQKATYSFTAQAGKPLKISLVWSDAPGSTTASLTLVNDLDLVITAPNGTKYVGNDFTAPYDNNWDGRNNVENVFINAPQSGTYTVEVQAYNVPVGPQTFSLAIVH
3JTX , Knot 169 396 0.85 40 223 383
GMNTLLKQLKPYPFARLHEAMQGISAPEGMEAVPLHIGEPKHPTPKVITDALTASLHELEKYPLTAGLPELRQACANWLKRRYDGLTVDADNEILPVLGSREALFSFVQTVLNPVSDGIKPAIVSPNPFYQIYEGATLLGGGEIHFANCPAPSFNPDWRSISEEVWKRTKLVFVCSPNNPSGSVLDLDGWKEVFDLQDKYGFIIASDECYSEIYFDGNKPLGCLQAAAQLGRSRQKLLMFTSLSKRSNVPGLRSGFVAGDAELLKNFLLYRTYHGSAMSIPVQRASIAAWDDEQHVIDNRRLYQEKFERVIPILQQVFDVKLPDASFYIWLKVPDGDDLAFARNLWQKAAIQVLPGRFLARDTEQGNPGEGYVRIALVADVATCVKAAEDIVSLYR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(5PJH_1)}(2) \setminus P_{f(5FAX_1)}(2)|=102\), \(|P_{f(5FAX_1)}(2) \setminus P_{f(5PJH_1)}(2)|=82\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1000000001101100010100100100010010010001111010000100100011010001100111101111001010000001001111011001101011110000000011011000011000000011000100100001000100011011010101100000010110110100110000111101100101011110001110000101001001011010010111100100100110011110001011110001111010110001111001000110111011010011100110010110110101100101100000101010101011101101000011001000
Pair \(Z_2\) Length of longest common subsequence
5PJH_1,5FAX_1 184 4
5PJH_1,3JTX_1 180 4
5FAX_1,3JTX_1 162 4

Newick tree

 
[
	5PJH_1:94.10,
	[
		3JTX_1:81,5FAX_1:81
	]:13.10
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{797 }{\log_{20} 797}-\frac{364}{\log_{20}364})=117.\)
Status Protein1 Protein2 d d1/2
Query variables 5PJH_1 5FAX_1 147 137.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]