CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6ICT_1 5TBJ_1 4JBP_1 Letter Amino acid
19 11 9 N Asparagine
24 18 11 D Aspartic acid
10 12 10 H Histidine
21 17 14 P Proline
30 22 12 T Threonine
8 4 4 W Tryptophan
27 14 21 R Arginine
44 19 24 E Glutamic acid
25 24 12 F Phenylalanine
38 17 21 K Lycine
36 27 19 S Serine
29 26 14 V Valine
21 16 12 Y Tyrosine
36 25 17 A Alanine
6 3 4 C Cysteine
20 20 10 Q Glutamine
25 23 15 G Glycine
23 19 13 I Isoleucine
51 32 34 L Leucine
11 12 3 M Methionine

6ICT_1|Chains A, B, C, D|Histone-lysine N-methyltransferase setd3|Homo sapiens (9606)
>5TBJ_1|Chains A, B, C, D|Histone-arginine methyltransferase CARM1|Mus musculus (10090)
>4JBP_1|Chain A|Aurora Kinase A|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6ICT , Knot 209 504 0.86 40 261 479
GMGKKSRVKTQKSGTGATATVSPKEILNLTSELLQKCSSPAPGPGKEWEEYVQIRTLVEKIRKKQKGLSVTFDGKREDYFPDLMKWASENGASVEGFEMVNFKEEGFGLRATRDIKAEELFLWVPRKLLMTVESAKNSVLGPLYSQDRILQAMGNIALAFHLLCERASPNSFWQPYIQTLPSEYDTPLYFEEDEVRYLQSTQAIHDVFSQYKNTARQYAYFYKVIQTHPHANKLPLKDSFTYEDYRWAVSSVMTRQNQIPTEDGSRVTLALIPLWDMCNHTNGLITTGYNLEDDRCECVALQDFRAGEQIYIFYGTRSNAEFVIHSGFFFDNNSHDRVKIKLGVSKSDRLYAMKAEVLARAGIPTSSVFALHFTEPPISAQLLAFLRVFCMTEEELKEHLLGDSAIDRIFTLGNSEFPVSWDNEVKLWTFLEDRASLLLKTYKTTIEEDKSVLKNHDLSVRAKMAIKLRLGEKEILEKAVKSAAVNREYYRQQMEEKAPLPKYE
5TBJ , Knot 157 361 0.85 40 213 356
GHMGHTLERSVFSERTEESSAVQYFQFYGYLSQQQNMMQDYVRTGTYQRAILQNHTDFKDKIVLDVGCGSGILSFFAAQAGARKIYAVEASTMAQHAEVLVKSNNLTDRIVVIPGKVEEVSLPEQVDIIISEPMGYMLFNERMLESYLHAKKYLKPSGNMFPTIGDVHLAPFTDEQLYMEQFTKANFWYQPSFHGVDLSALRGAAVDEYFRQPVVDTFDIRILMAKSVKYTVNFLEAKEGDLHRIEIPFKFHMLHSGLVHGLAFWFDVAFIGSIMTVWLSTAPTEPLTHWYQVRCLFQSPLFAKAGDTLSGTCLLIANKRQSYDISIVAQVDQTGSKSSNLLDLKNPFFRYTGTTPSPPPG
4JBP , Knot 127 279 0.85 40 177 272
SKKRQWALEDFEIGRPLGKGKFGNVYLAREKQSKFILALKVLFKAQLEKAGVEHQLRREVEIQSHLRHPNILRLYGYFHDATRVYLILEYAPLGTVYRELQKLSKFDEQRTATYITELANALSYCHSKRVIHRDIKPENLLLGSAGELKIADFGWSVHAPSSRRTDLCGTLDYLPPEMIEGRMHDEKVDLWSLGVLCYEFLVGKPPFEANTYQETYKRISRVEFTFPDFVTEGARDLISRLLKHNPSQRPMLREVLEHPWITANSSKPSNCQNKESASK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6ICT_1)}(2) \setminus P_{f(5TBJ_1)}(2)|=100\), \(|P_{f(5TBJ_1)}(2) \setminus P_{f(6ICT_1)}(2)|=52\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:111000010000010110101010011010001100000111111001000101001100100000110101010000011011011000110101101101000111101000101001111110011101001000111110000011011101111101100010100110101001100000110100001001000011001100000010001010011000101001110001000000111001100000110001001011111110100000111001001000000011100101100101101000010111001111000000010101110000010110101110111100011110100111010111110110100001000111001100110110001110100010110110001011100000010000011000010101011101011000110011001110000000010001111000
Pair \(Z_2\) Length of longest common subsequence
6ICT_1,5TBJ_1 152 4
6ICT_1,4JBP_1 176 4
5TBJ_1,4JBP_1 174 3

Newick tree

 
[
	4JBP_1:91.01,
	[
		6ICT_1:76,5TBJ_1:76
	]:15.01
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{865 }{\log_{20} 865}-\frac{361}{\log_{20}361})=135.\)
Status Protein1 Protein2 d d1/2
Query variables 6ICT_1 5TBJ_1 174 147
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]