CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6GSZ_1 1VLI_1 4DXH_1 Letter Amino acid
43 14 12 R Arginine
59 24 17 D Aspartic acid
65 30 25 L Leucine
57 17 24 T Threonine
66 19 39 V Valine
35 30 24 I Isoleucine
31 19 18 F Phenylalanine
76 23 26 S Serine
31 11 8 N Asparagine
4 4 14 C Cysteine
27 16 7 H Histidine
12 10 9 M Methionine
44 23 20 P Proline
33 9 4 Y Tyrosine
55 47 28 A Alanine
34 13 8 Q Glutamine
52 26 21 E Glutamic acid
81 23 38 G Glycine
34 24 30 K Lycine
31 3 2 W Tryptophan

6GSZ_1|Chain A|Alpha-L-rhamnosidase|Aspergillus terreus (33178)
>1VLI_1|Chain A|Spore coat polysaccharide biosynthesis protein spsE|Bacillus subtilis (1423)
>4DXH_1|Chains A, B|Alcohol dehydrogenase E chain|Equus caballus (9796)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6GSZ , Knot 324 870 0.84 40 310 802
ALSISQVAFEHHRTALGIGETQPRVSWRFDGNVSDWEQRAYEIEVKRAGHDADVFRSESSDSVLVPWPSSPLQSGEEATVRVRSFGSDGQHDTPWSDAVTVEPGLLTPDDWHDAVVIASDRPTEVDATHRPIQFRKEFSVDDSYVSARLYITALGLYEARINDQRVGDHVMAPGWQSYQYRHEYNTYDVTDLLKQGPNAIGVTVGEGWYSGRIGYDGGKRNIYGDTLGLLSLLVVTKSDGSKLYIPSDSSWKSSTGPIISSEIYDGEEYDSRLEQKGWSQVGFNSTGWLGTHELSFPKERLASPDGPPVRRVAEHKLANVFSSASGKTVLDFGQNLVGWLRIRVKGPKGQTIRFVHTEVMENGEVATRPLRQAKATDHFTLSGEGVQEWEPSFTYHGFRYVQVDGWPADTPLDENSVTAIVVHSDMERTGYFECSNPLISKLHENILWSMRGNFFSIPTDCPQRDERLGWTGDIHAFSRTANFIYDTAGFLRAWLKDARSEQLNHSYSLPYVIPNIHGNGETPTSIWGDAIVGVPWQLYESFGDKVMLEEQYGGAKDWVDKGIVRNDVGLWDRSTFQWADWLDPKAPADDPGDATTNKYLVSDAYLLHSTDMLANISTSLSKGEEASNYTEWHAKLTKEFQKAWITSNGTMANETQTGLALPLYFDLFPSAEQAQSAAKRLVNIIKQNDYKVGTGFAGTHLLGHTLSKYGESDAFYSMLRQTEVPSWLYQVVMNGTTTWERWDSMLPNGSINPGQMTSFNHYAVGSVGSWLHEVIGGLSPAEPGWRRINIEVVPGGDLQQASTKFLTPYGMASTKWWLDGQDQSCGGFDFHLVAEVPPNTRATVVLPGKGGEKVDVGSGVHEYHVRCVKL
1VLI , Knot 160 385 0.82 40 216 360
MGSDKIHHHHHHMAAFQIANKTVGKDAPVFIIAEAGINHDGKLDQAFALIDAAAEAGADAVKFQMFQADRMYQKDPGLYKTAAGKDVSIFSLVQSMEMPAEWILPLLDYCREKQVIFLSTVCDEGSADLLQSTSPSAFKIASYEINHLPLLKYVARLNRPMIFSTAGAEISDVHEAWRTIRAEGNNQIAIMHCVAKYPAPPEYSNLSVIPMLAAAFPEAVIGFSDHSEHPTEAPCAAVRLGAKLIEKHFTIDKNLPGADHSFALNPDELKEMVDGIRKTEAELKQGITKPVSEKLLGSSYKTTTAIEGEIRNFAYRGIFTTAPIQKGEAFSEDNIAVLRPGQKPQGLHPRFFELLTSGVRAVRDIPADTGIVWDDILLKDSPFHE
4DXH , Knot 158 374 0.83 40 210 358
STAGKVIKCKAAVLWEEKKPFSIEEVEVAPPKAHEVRIKMVATGICRSDDHVVSGTLVTPLPVIAGHEAAGIVESIGEGVTTVRPGDKVIPLFTPQCGKCRVCKHPEGNFCLKNDLSMPRGTMQDGTSRFTCRGKPIHHFLGTSTFSQYTVVDEISVAKIDAASPLEKVCLIGCGFSTGYGSAVKVAKVTQGSTCAVFGLGGVGLSVIMGCKAAGAARIIGVDINKDKFAKAKEVGATECVNPQDYKKPIQEVLTEMSNGGVDFSFEVIGRLDTMVTALSCCQEAYGVSVIVGVPPDSQNLSMNPMLLLSGRTWKGAIFGGFKSKDSVPKLVADFMAKKFALDPLITHVLPFEKINEGFDLLRSGESIRTILTF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6GSZ_1)}(2) \setminus P_{f(1VLI_1)}(2)|=132\), \(|P_{f(1VLI_1)}(2) \setminus P_{f(6GSZ_1)}(2)|=38\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110100111000001111100010101010101001000100101001100101100000001111110011001001010100110010000110011010111101001001111100010010100011010001010000101010101111001010000110011111100000000000001001100110111101101100101100110001010011110111100001001011000010000111100010010000001000110011100011110001011000110101111001100011011001010011011001111101010110100101100011001011001100101000101010110010101000110010101111001100001011110001000101000011100100011101010110110001000001110101011000101100011110111001000010000011011101010100100111011111110100011001110000111001100111000111100001011011010111001101000001100101100001110100010010010000010101000100111000101100000111111010111010010011001101100000011011110011100100010001100110000110110011101000100100111010101101001000111011011001111101101110010101111101001000110101110001110100000111010111011100010111110110010110110000100101
Pair \(Z_2\) Length of longest common subsequence
6GSZ_1,1VLI_1 170 4
6GSZ_1,4DXH_1 170 5
1VLI_1,4DXH_1 148 4

Newick tree

 
[
	6GSZ_1:88.36,
	[
		1VLI_1:74,4DXH_1:74
	]:14.36
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1255 }{\log_{20} 1255}-\frac{385}{\log_{20}385})=226.\)
Status Protein1 Protein2 d d1/2
Query variables 6GSZ_1 1VLI_1 290 207.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]