CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2CET_1 3UZB_1 7RMC_1 Letter Amino acid
29 24 34 D Aspartic acid
30 31 55 L Leucine
25 18 17 F Phenylalanine
22 20 22 P Proline
23 18 29 S Serine
23 19 21 R Arginine
19 10 20 H Histidine
30 22 39 I Isoleucine
26 15 18 Y Tyrosine
39 18 46 V Valine
25 8 18 N Asparagine
1 6 8 C Cysteine
39 36 45 G Glycine
29 13 43 K Lycine
7 5 16 M Methionine
30 33 30 A Alanine
9 12 15 Q Glutamine
34 23 45 E Glutamic acid
13 22 34 T Threonine
15 5 6 W Tryptophan

2CET_1|Chains A, B|BETA-GLUCOSIDASE A|THERMOTOGA MARITIMA (2336)
>3UZB_1|Chains A, B, C, D|Branched-chain-amino-acid aminotransferase|Deinococcus radiodurans (1299)
>7RMC_1|Chains A[auth D], B[auth E], C[auth F], D[auth Q], E[auth R], F[auth S], G[auth T], H[auth U], I[auth V], J[auth W], K[auth g], L[auth h]|CTP synthase 1|Saccharomyces cerevisiae (4932)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2CET , Knot 189 468 0.82 40 238 440
MGSSHHHHHHSSGLVPRGSHMASNVKKFPEGFLWGVATASYQIEGSPLADGAGMSIWHTFSHTPGNVKNGDTGDVACDHYNRWKEDIEIIEKLGVKAYRFSISWPRILPEGTGRVNQKGLDFYNRIIDTLLEKGITPFVTIYHWDLPFALQLKGGWANREIADWFAEYSRVLFENFGDRVKNWITLNEPWVVAIVGHLYGVHAPGMRDIYVAFRAVHNLLRAHARAVKVFRETVKDGKIGIVFNNGYFEPASEKEEDIRAVRFMHQFNNYPLFLNPIYRGDYPELVLEFAREYLPENYKDDMSEIQEKIDFVGLNYYSGHLVKFDPDAPAKVSFVERDLPKTAMGWEIVPEGIYWILKKVKEEYNPPEVYITENGAAFDDVVSEDGRVHDQNRIDYLKAHIGQAWKAIQEGVPLKGYFVWSLLDNFEWAEGYSKRFGIVYVDYSTQKRIVKDSGYWYSNVVKNNGLED
3UZB , Knot 157 358 0.86 40 220 343
MRLTILGMTAHDSRPEQAKKLADIDWSTLGFSYIRTDLRYLAHWKDGEWDAGTLTEDNQIHLAEGSTALHYGQQCFEGLKAYRCADGSINLFRPDQNAARMRMSCRRLLMPELSDEQFIDACLQVVRANEHFLPPYGTGGSLYLRPFVIGVGDNIGVRTAPEFIFSVFCVPVGPYFKGGLTPTNFITSDYDRAAPHGTGAAKVGGNYAASLLPGYEAKKRDFADVIYLDPATHTTIEEAGAANFFAITQDGQKFVTPQSPSILPSITKYSLLWLAEHRLGLEVEEGDIRIDELGKFSEAGACGTAAVITPIGGIQHGDDFHVFYSESEPGPVTRRLYDELVGIQYGDKEAPEGWIVKV
7RMC , Knot 226 561 0.85 40 265 519
MKYVVVSGGVISGIGKGVLASSTGMLMKTLGLKVTSIKIDPYMNIDAGTMSPLEHGECFVLDDGGETDLDLGNYERYLGVTLTKDHNITTGKIYSHVIAKERKGDYLGKTVQIVPHLTNAIQDWIERVAKIPVDDTGMEPDVCIIELGGTVGDIESAPFVEALRQFQFKVGKENFALIHVSLVPVIHGEQKTKPTQAAIKGLRSLGLVPDMIACRCSETLDKPTIDKIAMFCHVGPEQVVNVHDVNSTYHVPLLLLEQKMIDYLHARLKLDEISLTEEEKQRGLELLSKWKATTGNFDESMETVKIALVGKYTNLKDSYLSVIKALEHSSMKCRRKLDIKWVEATDLEPEAQESNKTKFHEAWNMVSTADGILIPGGFGVRGTEGMVLAARWARENHIPFLGVCLGLQIATIEFTRSVLGRKDSHSAEFYPDIDEKNHVVVFMMRLGLRPTFFQNETEWSQIKKLYGDVSEVHERHRHRYEINPKMVDELENNGLIFVGKDDTGKRCEILELKNHPYYIATQYHPEYTSKVLDPSKPFLGLVAASAGILQDVIEGKYDLEA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2CET_1)}(2) \setminus P_{f(3UZB_1)}(2)|=88\), \(|P_{f(3UZB_1)}(2) \setminus P_{f(2CET_1)}(2)|=70\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110000000000111101001100100110111111101000101011101111011001000110100100101100000010001011001110100101011011101010100011010001100110011011101001011111010111100011011100001110011001001101001111111101011011110010111011001101010110110001001011111001010110000001011011001000111101100100101110110001100000010010001011110000101101010111010110001100111101110110111001000001101010001111001100010100000100101011011011001111010111011001011010000111101000000011000101000110001100
Pair \(Z_2\) Length of longest common subsequence
2CET_1,3UZB_1 158 4
2CET_1,7RMC_1 165 4
3UZB_1,7RMC_1 155 4

Newick tree

 
[
	2CET_1:81.82,
	[
		3UZB_1:77.5,7RMC_1:77.5
	]:4.32
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{826 }{\log_{20} 826}-\frac{358}{\log_{20}358})=126.\)
Status Protein1 Protein2 d d1/2
Query variables 2CET_1 3UZB_1 158 139
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]