CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2QSU_1 7NAL_1 1PZZ_1 Letter Amino acid
11 12 9 N Asparagine
4 14 11 H Histidine
11 21 5 F Phenylalanine
16 27 8 T Threonine
3 9 8 Y Tyrosine
8 52 6 R Arginine
22 36 7 D Aspartic acid
6 34 6 Q Glutamine
12 54 9 E Glutamic acid
13 28 11 K Lycine
5 8 1 M Methionine
15 44 10 S Serine
26 47 4 V Valine
2 17 3 C Cysteine
24 69 4 A Alanine
22 60 13 G Glycine
20 25 5 I Isoleucine
32 98 17 L Leucine
14 31 8 P Proline
1 11 1 W Tryptophan

2QSU_1|Chains A, B|5'-methylthioadenosine nucleosidase|Arabidopsis thaliana (3702)
>7NAL_1|Chains A, B, C, D, E, F, G, H|NAD(+) hydrolase SARM1|Homo sapiens (9606)
>1PZZ_1|Chains A, B|Heparin-binding growth factor 1|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2QSU , Knot 120 267 0.83 40 163 257
MAPHGDGLSDIEEPEVDAQSEILRPISSVVFVIAMQAEALPLVNKFGLSETTDSPLGKGLPWVLYHGVHKDLRINVVCPGRDAALGIDSVGTVPASLITFASIQALKPDIIINAGTCGGFKVKGANIGDVFLVSDVVFHDRRIPIPMFDLYGVGLRQAFSTPNLLKELNLKIGRLSTGDSLDMSTQDETLIIANDATLKDMEGAAVAYVADLLKIPVVFLKAVTDLVDGDKPTAEEFLQNLTVVTAALEGTATKVINFINGRNLSDL
7NAL , Knot 259 697 0.81 40 269 612
LAVPGPDGGGGTGPWWAAGGRGPREVSPGAGTEVQDALERALPELQQALSALKQAGGARAVGAGLAEVFQLVEEAWLLPAVGREVAQGLCDAIRLDGGLDLLLRLLQAPELETRVQAARLLEQILVAENRDRVARIGLGVILNLAKEREPVELARSVAGILEHMFKHSEETCQRLVAAGGLDAVLYWCRRTDPALLRHCALALGNCALHGGQAVQRRMVEKRAAEWLFPLAFSKEDELLRLHACLAVAVLATNKEVEREVERSGTLALVEPLVASLDPGRFARCLVDASDTSQGRGPDDLQRLVPLLDSNRLEAQCIGAFYLCAEAAIKSLQGKTKVFSDIGAIQSLKRLVSYSTNGTKSALAKRALRLLGEEVPRPILPSVPSWKEAEVQTWLQQIGFSKYCESFREQQVDGDLLLRLTEEELQTDLGMKSGITRKRFFRELTELKTFANYSTCDRSNLADWLGSLDPRFRQYTYGLVSCGLDRSLLHRVSEQQLLEDCGIHLGVHRARILTAAREMLHSPLPCTGGKPSGDTPDVFISYRRNSGSQLASLLKVHLQLHGFSVFIDVEKLEAGKFEDKLIQSVMGARNFVLVLSPGALDKCMQDHDCKDWVHKEIVTALSCGKNIVPIIDGFEWPEPQVLPEDMQAVLTFNGIKWSHEYQEATIEKIIRFLQGRSSRDSSAGSDTSLEGAAPMGPT
1PZZ , Knot 69 146 0.78 40 111 137
HHHHHHFNLPPGNYKKPKLLYCSNGGHFLRILPDGTVDGTRDRSDQHIQLQLSAESNGEVYIKSTETGQYLAMDTDGLLYGSQTPNEECLFLERLEENHYNTYISKKHAEKNWFVGLKKNGSCKRGPRTHYGQKAILFLPLPVSSD

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2QSU_1)}(2) \setminus P_{f(7NAL_1)}(2)|=31\), \(|P_{f(7NAL_1)}(2) \setminus P_{f(2QSU_1)}(2)|=137\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111010110010010101000110110011111110101111100111000000111011111100110001010110110011111001101110110110101101011101100111010110110111100111000011111101011110011001011001010110100100101000000111100101001011111011011011111101100110100101001100101101110101001101101001001
Pair \(Z_2\) Length of longest common subsequence
2QSU_1,7NAL_1 168 4
2QSU_1,1PZZ_1 182 3
7NAL_1,1PZZ_1 214 4

Newick tree

 
[
	1PZZ_1:10.92,
	[
		2QSU_1:84,7NAL_1:84
	]:19.92
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{964 }{\log_{20} 964}-\frac{267}{\log_{20}267})=188.\)
Status Protein1 Protein2 d d1/2
Query variables 2QSU_1 7NAL_1 232 160
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]