CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6YKG_1 1NNB_1 5MGK_1 Letter Amino acid
43 22 7 D Aspartic acid
7 18 2 C Cysteine
36 8 1 Q Glutamine
48 16 2 K Lycine
20 9 9 F Phenylalanine
30 30 4 T Threonine
29 26 4 N Asparagine
24 18 3 Y Tyrosine
28 23 9 R Arginine
17 5 6 M Methionine
46 35 9 S Serine
5 14 2 W Tryptophan
28 25 5 I Isoleucine
40 21 12 E Glutamic acid
28 30 5 G Glycine
20 7 3 H Histidine
81 18 8 L Leucine
32 22 4 P Proline
39 23 4 V Valine
35 17 6 A Alanine

6YKG_1|Chain A[auth AAA]|Phosphatidylinositol 3-kinase catalytic subunit type 3|Homo sapiens (9606)
>1NNB_1|Chain A|NEURAMINIDASE|Influenza A virus (11320)
>5MGK_1|Chain A|Bromodomain adjacent to zinc finger domain protein 2A|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6YKG , Knot 254 636 0.86 40 275 587
MHHHHHHSSGVDLGTENLYFQSMSKHHKLARSLRSGPSDHDLKPNAATRDQLNIIVSYPPTKQLTYEEQDLVWKFRYYLTNQEKALTKFLKCVNWDLPQEAKQALELLGKWKPMDVEDSLELLSSHYTNPTVRRYAVARLRQADDEDLLMYLLQLVQALKYENFDDIKNGLEPTKKDSQSSVSENVSNSGINSAEIDSSQIITSPLPSVSSPPPASKTKEVPDGENLEQDLCTFLISRACKNSTLANYLYWYVIVECEDQDTQQRDPKTHEMYLNVMRRFSQALLKGDKSVRVMRSLLAAQQTFVDRLVHLMKAVQRESGNRKKKNERLQALLGDNEKMNLSDVELIPLPLEPQVKIRGIIPETATLAKSANMPAQLFFKTEDGGKYPVLFKHGDDLRQDQLILQIISLMDKLLRKENLDLKLTPYKVLATSTKHGFLQWIQGSVPVAEVLDTEGSIQNFFRKYAPSENGPNGISAEVMDTYVKSCAGYCVITYILGVGDRHLDNLLLTKTGKLFHIDFGYILGRDPKPLPPPMKLNKEMVEGMGGTQSEQYQEFRKQCYTAFLHLRRYSNLILNLFSLMVDANIPDIALEPDKTVKKVQDKFRLDLSDEEAVHYMQSLIDESVHALFAAVVEQIH
1NNB , Knot 165 387 0.84 40 224 372
DFNNLTKGLCTINSWHIYGKDNAVRIGEDSDVLVTREPYVSCDPDECRFYALSQGTTIRGKHSNGTIHDRSQYRALISWPLSSPPTVYNSRVECIGWSSTSCHDGKTRMSICISGPNNNASAVIWYNRRPVTEINTWARNILRTQESECVCHNGVCPVVFTDGSATGPAETRIYYFKEGKILKWEPLAGTAKHIEECSCYGERAEITCTCRDNWQGSNRPVIRIDPVAMTHTSQYICSPVLTDNPRPNDPTVGKCNDPYPGNNNNGVKGFSYLDGVNTWLGRTISIASRSGYEMLKVPNALTDDKSKPTQGQTIVLNTDWSGYSGSFMDYWAEGECYRACFYVELIRGRPKEDKVWWTSNSIVSMCSSTEFLGQWDWPDGAKIEYFL
5MGK , Knot 55 105 0.81 40 88 102
SMHSDLTFCEIILMEMESHDAAWPFLEPVNPRLVSGYRRIIKNPMDFSTMRERLLRGGYTSSEEFAADALLVFDNCQTFNEDDSEVGKAGHIMRRFFESRWEEFY

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6YKG_1)}(2) \setminus P_{f(1NNB_1)}(2)|=113\), \(|P_{f(1NNB_1)}(2) \setminus P_{f(6YKG_1)}(2)|=62\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100000000110110001010010000011001001100001010110000101110011000100000011101000100000110011001010110010011011101011010001011000000101000111010010000111011011011000010010011010000000010001000110010100001100111010011110000011010010001001110010000011001010111000000000001000010101100100111010001011001111000110011011011000010000000010111100001010010111111010101011110010110010111011100001100111100100100001110110110011000010101010011100000111011010111101100010100110001100011011010110001000110011001111100010011100010110101101110010111111010001101111000000001000000111010000011101101110101101110100010010001010100001100100110001011111110010
Pair \(Z_2\) Length of longest common subsequence
6YKG_1,1NNB_1 175 4
6YKG_1,5MGK_1 225 4
1NNB_1,5MGK_1 208 3

Newick tree

 
[
	5MGK_1:11.43,
	[
		6YKG_1:87.5,1NNB_1:87.5
	]:26.93
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1023 }{\log_{20} 1023}-\frac{387}{\log_{20}387})=168.\)
Status Protein1 Protein2 d d1/2
Query variables 6YKG_1 1NNB_1 217 173.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]