CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
3GMR_1 3SQC_1 2QFY_1 Letter Amino acid
12 49 17 R Arginine
5 6 3 C Cysteine
7 24 35 I Isoleucine
25 57 34 L Leucine
8 16 11 M Methionine
11 22 15 F Phenylalanine
8 10 13 N Asparagine
14 43 28 D Aspartic acid
20 61 30 G Glycine
14 10 15 H Histidine
13 21 41 K Lycine
28 24 25 S Serine
13 25 4 W Tryptophan
8 32 17 Y Tyrosine
20 43 22 V Valine
17 57 30 A Alanine
22 23 15 Q Glutamine
15 41 27 E Glutamic acid
13 41 17 P Proline
14 26 28 T Threonine

3GMR_1|Chain A|T-cell surface glycoprotein CD1d1|Mus musculus (10090)
>3SQC_1|Chains A, B, C|SQUALENE--HOPENE CYCLASE|Alicyclobacillus acidocaldarius (405212)
>2QFY_1|Chains A, B, C, D, E, F|Isocitrate dehydrogenase [NADP]|Saccharomyces cerevisiae (4932)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
3GMR , Knot 126 287 0.82 40 184 271
SEAQQKNYTFRCLQMSSFANRSWSRTDSVVWLGDLQTHRWSNDSATISFTKPWSQGKLSNQQWEKLQHMFQVYRVSFTRDIQELVKMMSPKEDYPIEIQLSAGCEMYPGNASESFLHVAFQGKYVVRFWGTSWQTVPGAPSWLDLPIKVLNADQGTSATVQMLLNDTCPLFVRGLLEAGKSDLEKQEKPVAWLSSVPSSAHGHRQLVCHVSGFYPKPVWVMWMRGDQEQQGTHRGDFLPNADETWYLQATLDVEAGEEAGLACRVKHSSLGGQDIILYWGSHHHHHH
3SQC , Knot 243 631 0.82 40 265 567
MAEQLVEAPAYARTLDRAVEYLLSCQKDEGYWWGPLLSNVTMEAEYVLLCHILDRVDRDRMEKIRRYLLHEQREDGTWALYPGGPPDLDTTIEAYVALKYIGMSRDEEPMQKALRFIQSQGGIESSRVFTRMWLALVGEYPWEKVPMVPPEIMFLGKRMPLNIYEFGSWARATVVALSIVMSRQPVFPLPERARVPELYETDVPPRRRGAKGGGGWIFDALDRALHGYQKLSVHPFRRAAEIRALDWLLERQAGDGSWGGIQPPWFYALIALKILDMTQHPAFIKGWEGLELYGVELDYGGWMFQASISPVWDTGLAVLALRAAGLPADHDRLVKAGEWLLDRQITVPGDWAVKRPNLKPGGFAFQFDNVYYPDVCDTAVVVWALNTLRLPDERRRRDAMTKGFRWIVGMQSSNGGWGAYDVDNTSDLPNHIPFCDFGEVTDPPSEDVTAHVLECFGSFGYDDAWKVIRRAVEYLKREQKPDGSWFGRWGVNYLYGTGAVVSALKAVGIDTREPYIQKALDWVEQHQNPDGGWGEDCRSYEDPAYAGKGASTPSQTAWALMALIAGGRAESEAARRGVQYLVETQRPDGGWDEPYYTGTGFPGDFYLGYTMYRHVFPTLALGRYKQAIERR
2QFY , Knot 177 427 0.83 40 231 402
MHHHHHHAMGIPGHAFSKIKVKQPVVELDGDEMTRIIWDKIKKKLILPYLDVDLKYYDLSVESRDATSDKITQDAAEAIKKYGVGIKCATITPDEARVKEFNLHKMWKSPNGTIRNILGGTVFREPIVIPRIPRLVPRWEKPIIIGRHAHGDQYKATDTLIPGPGSLELVYKPSDPTTAQPQTLKVYDYKGSGVAMAMYNTDESIEGFAHSSFKLAIDKKLNLFLSTKNTILKKYDGRFKDIFQEVYEAQYKSKFEQLGIHYEHRLIDDMVAQMIKSKGGFIMALKNYDGDVQSDIVAQGFGSLGLMTSILVTPDGKTFESEAAHGTVTRHYRKYQKGEETSTNSIASIFAWSRGLLKRGELDNTPALCKFANILESATLNTVQQDGIMTKDLALACGNNERSAYVTTEEFLDAVEKRLQKEIKSIE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(3GMR_1)}(2) \setminus P_{f(3SQC_1)}(2)|=54\), \(|P_{f(3SQC_1)}(2) \setminus P_{f(3GMR_1)}(2)|=135\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00100000010010100110001000001111101000010000101010011001010000100100110100101000100110110100001101010110010110100011011101001101110010011111011011101101001001010111000011110111011000100000111110011001010001100101101011111110100000100010111010001010101010110011110010000111001110110000000
Pair \(Z_2\) Length of longest common subsequence
3GMR_1,3SQC_1 189 4
3GMR_1,2QFY_1 177 6
3SQC_1,2QFY_1 152 5

Newick tree

 
[
	3GMR_1:96.17,
	[
		2QFY_1:76,3SQC_1:76
	]:20.17
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{918 }{\log_{20} 918}-\frac{287}{\log_{20}287})=170.\)
Status Protein1 Protein2 d d1/2
Query variables 3GMR_1 3SQC_1 215 155.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]