CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7WQM_1 5KZQ_1 3LCV_1 Letter Amino acid
12 18 9 Q Glutamine
14 43 12 G Glycine
3 11 2 W Tryptophan
14 20 7 Y Tyrosine
23 42 14 S Serine
23 34 19 V Valine
7 17 3 C Cysteine
20 34 17 E Glutamic acid
13 29 7 F Phenylalanine
18 30 18 P Proline
22 64 31 A Alanine
26 14 7 K Lycine
6 8 4 M Methionine
9 22 15 T Threonine
14 21 14 I Isoleucine
35 55 32 L Leucine
17 43 29 R Arginine
16 18 9 N Asparagine
21 29 19 D Aspartic acid
16 18 13 H Histidine

7WQM_1|Chains A, B|Aldo-keto reductase family 1 member C3|Homo sapiens (9606)
>5KZQ_1|Chain A|Metabotropic glutamate receptor 2|Homo sapiens (9606)
>3LCV_1|Chain A[auth B]|Sisomicin-gentamicin resistance methylase Sgm|Micromonospora zionensis (1879)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7WQM , Knot 143 329 0.84 40 206 306
MHHHHHHDSKHQCVKLNDGHFMPVLGFGTYAPPEVPRSKALEVTKLAIEAGFRHIDSAHLYNNEEQVGLAIRSKIADGSVKREDIFYTSKLWSTFHRPELVRPALENSLKKAQLDYVDLYLIHSPMSLKPGEELSPTDENGKVIFDIVDLCTTWEAMEKCKDAGLAKSIGVSNFNRRQLEMILNKPGLKYKPVCNQVECHPYFNRSKLLDFCKSKDIVLVAYSALGSQRDKRWVDPNSPVLLEDPVLCALAKKHKRTPALIALRYQLQRGVVVLAKSYNEQRIRQNVQVFEFQLTAEDMKAIDGLDRNLHYFNSDSFASHPNYPYSDEY
5KZQ , Knot 224 570 0.83 40 257 520
MGSLLALLALLLLWGAVAEGPAKKVLTLEGDLVLGGLFPVHQKGGPAEDCGPVNEHRGIQRLEAMLFALDRINRDPHLLPGVRLGAHILDSCSKDTHALEQALDFVRASLSRGADGSRHICPDGSYATHGDAPTAITGVIGGSYSDVSIQVANLLRLFQIPQISYASTSAKLSDKSRYDYFARTVPPDFFQAKAMAEILRFFNWTYVSTVASEGDYGETGIEAFELEARARNICVATSEKVGRAMSRAAFEGVVRALLQKPSARVAVLFTRSEDARELLAASQRLNASFTWVASDGWGALESVVAGSEGAAEGAITIELASYPISDFASYFQSLDPWNNSRNPWFREFWEQRFRCSFRQRDCAAHSLRAVPFEQESKIMFVVNAVYAMAHALHNMHRALCPNTTRLCDAMRPVNGRRLYKDFVLNVKFDAPFRPADTHNEVRFDRFGDGIGRYNIFTYLRAGSGRYRYQKVGYWAEGLTLDTSLIPWASPSAGPLPASRCSEPCLQNEVKSVQPGEVCCWLCIPCQPYEYRLDEFTCADCGLGYWPNASLTGCFELPQEYIREGHHHHHH
3LCV , Knot 121 281 0.81 40 166 263
HHHHHHHMTAPAADDRIDEIERAITKSRRYQTVAPATVRRLARAALVAARGDVPDAVKRTKRGLHEIYGAFLPPSPPNYAALLRHLDSAVDAGDDEAVRAALLRAMSVHISTRERLPHLDEFYRELFRHLPRPNTLRDLACGLNPLAAPWMGLPAETVYIASDIDARLVGFVDEALTRLNVPHRTNVADLLEDRLDEPADVTLLLKTLPCLETQQRGSGWEVIDIVNSPNIVVTFPTKSLGQRSKGMFQNYSQSFESQARERSCRIQRLEIGNELIYVIQK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7WQM_1)}(2) \setminus P_{f(5KZQ_1)}(2)|=60\), \(|P_{f(5KZQ_1)}(2) \setminus P_{f(7WQM_1)}(2)|=111\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10000000000001010010111111110011101100011010011101110010010100000011111000110101000011000011001001011011100010010100101011001101011001010000101110110100010110000011110011100100001011100111000110001000101000011010000011111001110000001101001111001110111000000111111000100111111000000010001011010101001011011000100100001100100100000
Pair \(Z_2\) Length of longest common subsequence
7WQM_1,5KZQ_1 171 6
7WQM_1,3LCV_1 178 6
5KZQ_1,3LCV_1 155 6

Newick tree

 
[
	7WQM_1:90.28,
	[
		5KZQ_1:77.5,3LCV_1:77.5
	]:12.78
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{899 }{\log_{20} 899}-\frac{329}{\log_{20}329})=153.\)
Status Protein1 Protein2 d d1/2
Query variables 7WQM_1 5KZQ_1 194 152
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]