CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4XRR_1 6YTP_1 6JPK_1 Letter Amino acid
66 39 40 A Alanine
9 7 5 C Cysteine
21 18 25 E Glutamic acid
7 22 23 K Lycine
6 9 10 M Methionine
39 21 19 R Arginine
47 34 22 G Glycine
4 19 16 Y Tyrosine
53 30 24 V Valine
28 34 19 P Proline
5 12 8 W Tryptophan
31 26 18 D Aspartic acid
7 19 20 H Histidine
13 21 21 I Isoleucine
45 54 33 L Leucine
14 26 23 F Phenylalanine
10 19 22 N Asparagine
6 20 17 Q Glutamine
19 36 32 S Serine
26 31 20 T Threonine

4XRR_1|Chains A, B|CalS8|Micromonospora echinospora (1877)
>6YTP_1|Chains A[auth AAA], B[auth BBB]|Glucosylceramidase|Homo sapiens (9606)
>6JPK_1|Chains A, B|Aspartate aminotransferase, cytoplasmic|Schizosaccharomyces pombe 972h- (284812)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4XRR , Knot 175 456 0.78 40 193 406
SNAMPFLPDPGEPSPLKVVIAGAGYVGTCLAVTLAGRGAEVVAVDSDPGTVADLRAGRCRLPEPGLAGAVRDLAATGRLTASTSYDPVGAADVVIVTVGTPTDAGHEMVTDQLVAACEQIAPRLRAGQLVILKSTVSPGTTRTLVAPLLESGGLVHERDFGLAFCPERLAEGVALAQVRTLPVVVGGCGPRSAAAAERFWRSALGVDVRQVPSAESAEVVKLATNWWIDANVAIANELARYCAVLGVDVLDVIGAANTLPKGSSMVNLLLPGVGVGGSCLTKDPWMAWRDGRDRGVSLRTVETARAVNDDMPRHTAAVIADELVKLGRDRNDTTIAVLGAAFKNDTGDVRNTPVRGVVAALRDSGFRVRIFDPLADPAEIVARFGTAPAASLDEAVSGAGCLAFLAGHRQFHELDFGALAERVDEPCLVFDGRMHLPPARIRELHRFGFAYRGIGR
6YTP , Knot 208 497 0.86 40 260 481
ARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYESTRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPMASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDIYHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQRLLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHSIITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAVALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWHRQ
6JPK , Knot 182 417 0.87 40 252 398
MSDYGFANIEEAKADAIFKLNAQYHQDEDPKKVNMSVGAYRDDTGKPWILPAVKKASKIVEEQASFNHEYLPIAGLPRFTKAAAEVLFRPNPHLLSEDRVASMQSVSGTGANFLAASFIETFYVKHTGAHVYISNPTWPVHRTLWEKLGVTVETYPYWDAKNRSFDYEGMLSTIKSAPEGSIFLLHACAHNPTGIDPTREQWLSIFESLLSRKHLVVFDIAYQGFASGDLNRDSWALNEFVKYNKDFFVCQSFAKNMGLYGERTGCMHYVAKDASTKNKVLSQLCIVQRNTISNPPAYGARIAAEILNSPQLFAEWEQDLKTMSSRIIEMRKRLRDSLVALKTPGSWDHITQQIGMFSFTGLTPAQVQFCQERYHLYFSANGRISMAGLNNSNVEHVAQAFNHAVRELPLEHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4XRR_1)}(2) \setminus P_{f(6YTP_1)}(2)|=48\), \(|P_{f(6YTP_1)}(2) \setminus P_{f(4XRR_1)}(2)|=115\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:001111110110101101111111011001110111011011110001101101011000110111111100111010101000001111101111011010011001100011110001110101101111000101100001111110011110000111110100110111110100111111101100111100110011110100110100101101100111010111100110001111101101111100110100110111111111100100011111001000110100100101100011000111110011011000000011111111000010100011011111100011010110111011011101101111010011011101111110001001011111001001011101010111101001001111001110
Pair \(Z_2\) Length of longest common subsequence
4XRR_1,6YTP_1 163 4
4XRR_1,6JPK_1 169 4
6YTP_1,6JPK_1 152 4

Newick tree

 
[
	4XRR_1:85.22,
	[
		6YTP_1:76,6JPK_1:76
	]:9.22
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{953 }{\log_{20} 953}-\frac{456}{\log_{20}456})=131.\)
Status Protein1 Protein2 d d1/2
Query variables 4XRR_1 6YTP_1 172 160
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]