CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8HGE_1 7VTA_1 6EQG_1 Letter Amino acid
25 12 7 K Lycine
26 16 10 F Phenylalanine
8 4 5 W Tryptophan
8 13 8 Y Tyrosine
30 31 38 A Alanine
1 6 4 C Cysteine
29 24 26 G Glycine
12 6 8 H Histidine
24 14 12 I Isoleucine
41 23 15 R Arginine
21 12 20 N Asparagine
33 23 9 D Aspartic acid
36 28 5 E Glutamic acid
31 8 18 P Proline
19 14 21 T Threonine
17 12 10 Q Glutamine
42 34 15 L Leucine
21 9 10 M Methionine
28 23 40 S Serine
28 16 17 V Valine

8HGE_1|Chain A[auth B]|Cytochrome P450|Marinobacter nauticus (2743)
>7VTA_1|Chain A|TvTS cyclase domain|Talaromyces verruculosus (198730)
>6EQG_1|Chains A, B, C|Poly(ethylene terephthalate) hydrolase|Ideonella sakaiensis (strain 201-F6) (1547922)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8HGE , Knot 193 480 0.82 40 244 458
MGMPTLPRTFDDIQSRLINATSRVVPMQRQIQGLKFLMSAKRKTFGPRRPMPEFVETPIPDVNTLALEDIDVSNPFLYRQGQWRAYFKRLRDEAPVHYQKNSPFGPFWSVTRFEDILFVDKSHDLFSAEPQIILGDPPEGLSVEMFIAMDPPKHDVQRSSVQGVVAPKNLKEMEGLIRSRTGDVLDSLPTDKPFNWVPAVSKELTGRMLATLLDFPYEERHKLVEWSDRMAGAASATGGEFADENAMFDDAADMARSFSRLWRDKEARRAAGEEPGFDLISLLQSNKETKDLINRPMEFIGNLTLLIVGGNDTTRNSMSGGLVAMNEFPREFEKLKAKPELIPNMVSEIIRWQTPLAYMRRIAKQDVELGGQTIKKGDRVVMWYASGNRDERKFDNPDQFIIDRKDARNHMSFGYGVHRCMGNRLAELQLRILWEEILKRFDNIEVVEEPERVQSNFARGYSRLMVKLTPNSLEHHHHHH
7VTA , Knot 146 328 0.86 40 200 320
MDFKYSRELKLESLDALNLTEGIPLRVNENIDLEFRGIERAHSDWERYVGKLNGFHGGRGPQFGFVSACIPECLPERMETVSYANEFAFLHDDMTDAASKDQVNGLNDDLLGGLDFTTEARSSASGKQQMQAKLLLEMLSIDRERTMVTIKAWADFMRGAAGRDHHRGFSSLDEYIPYRCADCGEKFWFGLVTFAMALSIPEQELELVQRLAQNAYLAAGLTNDLYSYEKEQLVAERSGTGQVFNAIAVIMQEHSVSISEAEDICRGRIREYAAKYVRDVADLRAKNELSRDSLAYLETGLYGISGSTAWNLDCPRYQVSTFVDFKTP
6EQG , Knot 124 298 0.79 40 172 279
MNFPRASRLMQAAVLGGLMAVSAAATAQTNPYARGPNPTAASLEASAGPFTVRSFTVSRPSGYGAGTVYYPTNAGGTVGAIAIVPGYTARQSSIKWWGPRLASHGFVVITIDTNSTLDQPSSRSSQQMAALRQVASLNGTSSSPIYGKVDTARMGVMGWSMGGGGSLISAANNPSLKAAAPQAPWDSSTNFSSVTVPTLIFACENDSIAPVNSSALPIYDSMSRNAKQFLEINGGSHSCANSGNSNQALIGKKGVAWMKRFMDNDTRYSTFACENPNSTRVSDFRTANCSLEHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8HGE_1)}(2) \setminus P_{f(7VTA_1)}(2)|=105\), \(|P_{f(7VTA_1)}(2) \setminus P_{f(8HGE_1)}(2)|=61\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111101100100100011010001111000101101110100001110011101100111010011100101001110001010101001000111000000111111010010011110000011010101111011011010111110110001000010111110010010111000010110011000110111110001010111011011000000110100011111010110110001110011011001001100001001110011101101100000000110011011101011111100000001011111100110010010101011101100110100111010011000101110010010011110101000000100100111000010001011011000110011010101110011001001011001001000110100011101010010000000
Pair \(Z_2\) Length of longest common subsequence
8HGE_1,7VTA_1 166 3
8HGE_1,6EQG_1 180 9
7VTA_1,6EQG_1 176 4

Newick tree

 
[
	6EQG_1:90.91,
	[
		8HGE_1:83,7VTA_1:83
	]:7.91
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{808 }{\log_{20} 808}-\frac{328}{\log_{20}328})=130.\)
Status Protein1 Protein2 d d1/2
Query variables 8HGE_1 7VTA_1 162 138
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]