CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6GTD_1 5MIJ_1 1WHJ_1 Letter Amino acid
25 6 1 H Histidine
106 4 4 I Isoleucine
113 28 9 L Leucine
89 8 4 F Phenylalanine
28 2 5 P Proline
8 1 1 W Tryptophan
65 15 6 A Alanine
101 6 3 N Asparagine
68 6 2 Y Tyrosine
173 9 8 K Lycine
50 8 5 V Valine
99 12 4 D Aspartic acid
101 15 4 E Glutamic acid
81 10 14 S Serine
51 6 6 T Threonine
9 2 2 C Cysteine
57 11 3 Q Glutamine
13 3 1 M Methionine
37 11 4 R Arginine
55 11 16 G Glycine

6GTD_1|Chain A|CRISPR-associated endonuclease Cas12a|Francisella tularensis subsp. novicida (strain U112) (401614)
>5MIJ_1|Chain A|Ferritin light chain|Equus caballus (9796)
>1WHJ_1|Chain A|RIKEN cDNA 1700024K14|Mus musculus (10090)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6GTD , Knot 459 1329 0.82 40 305 1071
MSIYQEFVNKYSLSKTLRFELIPQGKTLENIKARGLILDDEKRAKDYKKAKQIIDKYHQFFIEEILSSVCISEDLLQNYSDVYFKLKKSDDDNLQKDFKSAKDTIKKQISEYIKDSEKFKNLFNQNLIDAKKGQESDLILWLKQSKDNGIELFKANSDITDIDEALEIIKSFKGWTTYFKGFHENRKNVYSSNDIPTSIIYRIVDDNLPKFLENKAKYESLKDKAPEAINYEQIKKDLAEELTFDIDYKTSEVNQRVFSLDEVFEIANFNNYLNQSGITKFNTIIGGKFVNGENTKRKGINEYINLYSQQINDKTLKKYKMSVLFKQILSDTESKSFVIDKLEDDSDVVTTMQSFYEQIAAFKTVEEKSIKETLSLLFDDLKAQKLDLSKIYFKNDKSLTDLSQQVFDDYSVIGTAVLEYITQQIAPKNLDNPSKKEQELIAKKTEKAKYLSLETIKLALEEFNKHRDIDKQCRFEEILANFAAIPMIFDEIAQNKDNLAQISIKYQNQGKKDLLQASAEDDVKAIKDLLDQTNNLLHKLKIFHISQSEDKANILDKDEHFYLVFEECYFELANIVPLYNKIRNYITQKPYSDEKFKLNFENSTLANGWDKNKEPDNTAILFIKDDKYYLGVMNKKNNKIFDDKAIKENKGEGYKKIVYKLLPGANKMLPKVFFSAKSIKFYNPSEDILRIRNHSTHTKNGSPQKGYEKFEFNIEDCRKFIDFYKQSISKHPEWKDFGFRFSDTQRYNSIDEFYREVENQGYKLTFENISESYIDSVVNQGKLYLFQIYNKDFSAYSKGRPNLHTLYWKALFDERNLQDVVYKLNGEAELFYRKQSIPKKITHPAKEAIANKNKDNPKKESVFEYDLIKDKRFTEDKFFFHCPITINFKSSGANKFNDEINLLLKEKANDVHILSIDRGERHLAYYTLVDGKGNIIKQDTFNIIGNDRMKTNYHDKLAAIEKDRDSARKDWKKINNIKEMKEGYLSQVVHEIAKLVIEYNAIVVFQDLNFGFKRGRFKVEKQVYQKLEKMLIEKLNYLVFKDNEFDKTGGVLRAYQLTAPFETFKKMGKQTGIIYYVPAGFTSKICPVTGFVNQLYPKYESVSKSQEFFSKFDKICYNLDKGYFEFSFDYKNFGDKAAKGKWTIASFGSRLINFRNSDKNHNWDTREVYPTKELEKLLKDYSIEYGHGECIKAAICGESDKKFFAKLTSVLNTILQMRNSKTGTELDYLISPVADVNGNFFDSRQAPKNMPQDADANGAYHIGLKGLMLLGRIKNNQEGKKLNLVIKNEEYFEFVQNRNNGSEFELENLYFQGELRRQASALEHHHHHH
5MIJ , Knot 83 174 0.82 40 127 170
SSQIRQNYSTEVEAAVNRLVNLYLRASYTYLSLGFYFDRDDVALEGVCHFFRELAEEKREGAERLLKMQNQRGGRALFQDLQKPSQDEWGTTLDAMKAAIVLEKSLNQALLDLHALGSAQADPHLCDFLESHFLDEEVKLIKKMGDHLTNIQRLVGSQAGLGEYLFERLTLKHD
1WHJ , Knot 53 102 0.80 40 78 97
GSSGSSGLPNSDHTTSRAMLTSLGLKLGDRVVIAGQKVGTLRFCGTTEFASGQWAGIELDEPEGKNNGSVGRVQYFKCAPKYGIFAPLSKISKLKDSGPSSG

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6GTD_1)}(2) \setminus P_{f(5MIJ_1)}(2)|=190\), \(|P_{f(5MIJ_1)}(2) \setminus P_{f(6GTD_1)}(2)|=12\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:101000110000100010101110100100101011110000010000010011000001110011001010001100000101010000000100010010001000100010000010011000110100100001111100000011011010001001001101100101100010110000001000001100110011000110110001000010001101100001000110010101000000100011010011011010001000110010011110110100000011000101000010000100001011100110000000111001000001100100100011110010000100010111001010010100101000001001000110000111011100100011100100100000011100000100101001011100100000100000100111011111111001100000110101000001000110101000101100110000011001011010000001011000001011100001011011110001000100010000010101000011011000001000111110000001111000000110001100001010001100111110011101110100101001000110100000000010100100010101000001101000010001010011101000000001001000100010010100100001001100101011010000101000101010010101110000100110010101011000001100100110011100000010000110001100001000011100110101000110010001011100010010110100100011000110101011000010111000100000001111000000100010010010010010100110011011100011111001011100101010001000100111001001110000100011110100101110010011000111001111100010110111001010000100000110010010001001010101000011001101010110110011010000000010000101000100110000100101001011101000001110100110011010000010010011011101010110000110011001010110011101111110100000100101110000010110000010010100101010100010110000000
Pair \(Z_2\) Length of longest common subsequence
6GTD_1,5MIJ_1 202 4
6GTD_1,1WHJ_1 247 4
5MIJ_1,1WHJ_1 145 3

Newick tree

 
[
	6GTD_1:12.35,
	[
		5MIJ_1:72.5,1WHJ_1:72.5
	]:50.85
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1503 }{\log_{20} 1503}-\frac{174}{\log_{20}174})=349.\)
Status Protein1 Protein2 d d1/2
Query variables 6GTD_1 5MIJ_1 428 240.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]