CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8XAC_1 4QRD_1 3LCA_1 Letter Amino acid
46 32 20 G Glycine
22 36 29 I Isoleucine
22 32 36 S Serine
4 9 0 W Tryptophan
64 30 53 A Alanine
5 0 4 C Cysteine
1 33 38 K Lycine
16 21 8 H Histidine
45 48 66 L Leucine
13 25 30 F Phenylalanine
18 28 29 T Threonine
33 34 33 D Aspartic acid
6 19 28 Q Glutamine
28 43 40 E Glutamic acid
36 28 22 P Proline
7 32 23 Y Tyrosine
40 34 16 V Valine
39 18 18 R Arginine
11 26 31 N Asparagine
11 19 9 M Methionine

8XAC_1|Chains A, B, C, D|Amidase family protein|Pseudonocardia acaciae (551276)
>4QRD_1|Chain A|Methionyl-tRNA synthetase|Staphylococcus aureus (1280)
>3LCA_1|Chain A|Protein TOM71|Saccharomyces cerevisiae (4932)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8XAC , Knot 185 467 0.81 40 202 415
MGMSEPCHATIAELQAGIASGAYSREDVVAAHLGRTERINPVTNSYCELRGDQVLAEARAADREYGRELSGPLDGVPMSIKDSFAVRGLRRTDGLPVHADRVADEDDEVVARLRDAGGLVLGHANVPDICIRWNTISGLYGIARNPRDPSRTAGGSSGGDAANVAAGMATVGMGQDLGGSIRVPASFCGVYGLRPGAGTVPNLSVIPPFPASPTLDAMGTSGPFARSAADLRTMFSVIAGAHPHDPVSVPAPLAGTASPRVAVLRGETGAVLDAEIEARLDATVDALRRAGFEVAEDVVPDLRRAPEVWAAINGTELINIALPEVGAEMTGSGRQHIEDMFGIFDLGLDLRAYHAVWLERRALQDALVRFLEDYPIIVAPVAGMPAPPLDFDHLIGREASARLFDRMRCVPWVNLFGLPGLALPNGIQLVTRRFHEPDLLATAEAIEPLLPAVEVADPVLEHHHHHH
4QRD , Knot 218 547 0.83 38 268 511
MGHHHHHHDYDIPTTENLYFQGAHMASMAKETFYITTPIYYPSGNLHIGHAYSTVAGDVIARYKRMQGYDVRYLTGTDEHGQKIQEKAQKAGKTEIEYLDEMIAGIKQLWAKLEISNDDFIRTTEERHKHVVEQVFERLLKQGDIYLGEYEGWYSVPDETYYTESQLVDPQYENGKIIGGKSPDSGHEVELVKEESYFFNISKYTDRLLEFYDQNPDFIQPPSRKNEMINNFIKPGLADLAVSRTSFNWGVHVPSNPKHVVYVWIDALVNYISALGYLSDDESLFNKYWPADIHLMAKEIVRFHSIIWPILLMALDLPLPKKVFAHGWILMKDGKMSKSKGNVVDPNILIDRYGLDATRYYLMRELPFGSDGVFTPEAFVERTNFDLANDLGNLVNRTISMVNKYFDGELPAYQGPLHELDEEMEAMALETVKSYTESMESLQFSVALSTVWKFISRTNKYIDETTPWVLAKDDSQKDMLGNVMAHLVENIRYAAVLLRPFLTHAPKEIFEQLNINNPQFMEFSSLEQYGVLTESIMVTGQPKPIFP
3LCA , Knot 211 533 0.82 38 237 485
NGEPDIAQLKGLSPSQRQAYAVQLKNRGNHFFTAKNFNEAIKYYQYAIELDPNEPVFYSNISACYISTGDLEKVIEFTTKALEIKPDHSKALLRRASANESLGNFTDAMFDLSVLSLNGDFDGASIEPMLERNLNKQAMKVLNENLSKDEGRGSQVLPSNTSLASFFGIFDSHLEVSSVNTSSNYDTAYALLSDALQRLYSATDEGYLVANDLLTKSTDMYHSLLSANTVDDPLRENAALALCYTGIFHFLKNNLLDAQVLLQESINLHPTPNSYIFLALTLADKENSQEFFKFFQKAVDLNPEYPPTYYHRGQMYFILQDYKNAKEDFQKAQSLNPENVYPYIQLACLLYKQGKFTESEAFFNETKLKFPTLPEVPTFFAEILTDRGDFDTAIKQYDIAKRLEEVQEKIHVGIGPLIGKATILARQSSQDPTQLDEEKFNAAIKLLTKACELDPRSEQAKIGLAQLKLQMEKIDEAIELFEDSAILARTMDEKLQATTFAEAAKIQKRLRADPIISAKMELTLARYRAKGML

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8XAC_1)}(2) \setminus P_{f(4QRD_1)}(2)|=49\), \(|P_{f(4QRD_1)}(2) \setminus P_{f(8XAC_1)}(2)|=115\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11100100101101011110110000011110110000101100000010100111010110000100101110111101000111011000011110100110000011101001111111010110101010010110111001001000111001101101111110111100111010111010110110111101101011111110101011100111100110100110111110100110111111101010111101001111010101010101011001110110011101001101111101001101111011101010100010011111011101010011110001100111011000111111111111111010011100101011001001111011111111110110110001001011101011011111101101110000000
Pair \(Z_2\) Length of longest common subsequence
8XAC_1,4QRD_1 164 6
8XAC_1,3LCA_1 163 5
4QRD_1,3LCA_1 133 4

Newick tree

 
[
	8XAC_1:86.23,
	[
		3LCA_1:66.5,4QRD_1:66.5
	]:19.73
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1014 }{\log_{20} 1014}-\frac{467}{\log_{20}467})=143.\)
Status Protein1 Protein2 d d1/2
Query variables 8XAC_1 4QRD_1 184 169
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]