CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1YET_1 7YFE_1 3NGF_1 Letter Amino acid
18 100 25 L Leucine
1 23 3 W Tryptophan
7 40 10 Y Tyrosine
11 92 18 V Valine
15 103 28 A Alanine
15 59 20 G Glycine
6 42 9 M Methionine
5 97 14 P Proline
8 70 18 R Arginine
16 68 15 D Aspartic acid
4 25 14 H Histidine
20 77 10 I Isoleucine
17 35 6 K Lycine
10 43 16 F Phenylalanine
15 103 9 S Serine
8 71 13 N Asparagine
25 53 21 E Glutamic acid
18 103 9 T Threonine
0 14 3 C Cysteine
9 57 8 Q Glutamine

1YET_1|Chain A|HEAT SHOCK PROTEIN 90|Homo sapiens (9606)
>7YFE_1|Chains A[auth 1], B[auth 2], C[auth 3], D[auth 4], E[auth 5], F[auth A], G[auth B], H[auth C], I[auth D], J[auth E], U[auth a], V[auth b], W[auth c], X[auth d], Y[auth e]|RNA helicase|Mammalian orthoreovirus 3 (538123)
>3NGF_1|Chains A, B|AP endonuclease, family 2|Brucella melitensis biovar Abortus (359391)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1YET , Knot 104 228 0.82 38 154 218
DQPMEEEEVETFAFQAEIAQLMSLIINTFYSNKEIFLRELISNSSDALDKIRYETLTDPSKLDSGKELHINLIPNKQDRTLTIVDTGIGMTKADLINNLGTIAKSGTKAFMEALQAGADISMIGQFGVGFYSAYLVAEKVTVITKHNDDEQYAWESSAGGSFTVRTDTGEPMGRGTKVILHLKEDQTEYLEERRIKEIVKKHSQFIGYPITLFVEKERDKEVSDDEAE
7YFE , Knot 453 1275 0.84 40 345 1121
MKRIPRKTKGKSSGKGNDSTSRSDDGSSQLRDKQSNKANPATAEPGTSNCEHYKARPGIASVQKATESAELPMKNNDEGTPDKRGNTKGALVNEHVEARDEADDATKKQAKDTEKAKAQVTYSDTGINNANELSRSGNVDNEGGSNQKPMSTRIAEATSAIVSKHPARVGLPPTASSGHGYQCHVCSAVLFSPLDLDAHVASHGLHGNMTLTSSEIQRHITEFISSWQNHPIVQVSADVENRKTAQLLHADTPRLVTWDAGLCTSFKIVPIVPAQVPQDVLAYTFFTSSYAIQSPFPEAAVSRIVVHTRWASNVDFDRDSSVIMAPPTENNIHLFKQLLNTETLSVRGANPLMFRANVLHMLLEFVLDNLYLNRHTGFSQDHTPFTEGANLRSLPGPDAEKWYSIMYPTRMGTPNVSKICNFVASCVRNRVGRFDRAQMMNGAMSEWVDVFETSDALTVSIRGRWMARLARMNINPTEIEWALTECAQGYVTVTSPYAPSVNRLMPYRISNAERQISQIIRVMNIGNNATVIQPVLQDISVLLQRISPLQIDPTIISNTMSTVSESTTQTLSPASSILGKLRPSNSDFSSFRVALAGWLYNGVVTTVIDDSSYPKDGGSVTSLENLWDFFILALALPLTTDPCAPVKAFMTLANMMVGFETIPMDNQIYTQSRRASAFSTPHTWPRCFMNIQLISPIDAPILRQWAEIIHRYWPNPSQIRYGTPNVFGSANLFTPPEVLLLPIDHQPANVTTPTLDFTNELTNWRARVCELMKNLVDNQRYQPGWTQSLVSSMRGTLGKLKLIKSMTPMYLQQLAPVELAVIAPMLPFPPFQVPYVRLDRDRVPTMVGVTRQSRDTITQPALSLSTTNTTVGVPLALDARAITVALLSGKYPPDLVTNVWYADAIYPMYADTEVFSNLQRDVITCEAVQTLVTLVAQISETQYPVDRYLDWIPSLRASAATAATFAEWVNTSMKTAFDLSDMLLEPLLSGDPRMTQLAIQYQQYNGRTFNVIPEMPGSVIADCVQLTAEVFNHEYNLFGIARGDIIIGRVQSTHLWSPLAPPPDLVFDRDTPGVHIFGRDCRISFGMNGAAPMIRDETGMMVPFEGNWIFPLALWQMNTRYFNQQFDAWIKTGELRIRIEMGAYPYMLHYYDPRQYANAWNLTSAWLEEITPTSIPSVPFMVPISSDHDISSAPAVQYIISTEYNDRSLFCTNSSSPQTIAGPDKHIPVERYNILTNPDAPPTQIQLPEVVDLYNVVTRYAYETPPITAVVMGVP
3NGF , Knot 123 269 0.85 40 179 262
MAHHHHHHMPRFAANLSTMFNEVPFLERFRLAAEAGFGGVEFLFPYDFDADVIARELKQHNLTQVLFNMPPGDWAAGERGMAAISGREQEFRDNVDIALHYALALDCRTLHAMSGITEGLDRKACEETFIENFRYAADKLAPHGITVLVEPLNTRNMPGYFIVHQLEAVGLVKRVNRPNVAVQLDLYHAQIMDGDLTRLIEKMNGAFSHVQIASVPDRHEPDEGELNYPYLFSVLESVGYRGWVGCEYNPRGKTESGLAWFAPYRDQSA

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1YET_1)}(2) \setminus P_{f(7YFE_1)}(2)|=13\), \(|P_{f(7YFE_1)}(2) \setminus P_{f(1YET_1)}(2)|=204\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:001100001001110101101101110010000011100110000011001000010010010010010101110000001011001111001011001101100100111011011101011101111100101110010110000000001100011101010000101110100111010000000100001001100000111011011100000001000010
Pair \(Z_2\) Length of longest common subsequence
1YET_1,7YFE_1 217 4
1YET_1,3NGF_1 173 3
7YFE_1,3NGF_1 194 4

Newick tree

 
[
	7YFE_1:10.82,
	[
		1YET_1:86.5,3NGF_1:86.5
	]:21.32
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1503 }{\log_{20} 1503}-\frac{228}{\log_{20}228})=333.\)
Status Protein1 Protein2 d d1/2
Query variables 1YET_1 7YFE_1 428 250
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]