CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
1ARE_1 6JAO_1 8IAK_1 Letter Amino acid
5 34 17 R Arginine
1 16 30 N Asparagine
2 1 13 C Cysteine
1 15 24 Q Glutamine
0 34 26 G Glycine
2 10 10 H Histidine
0 16 27 I Isoleucine
1 32 58 L Leucine
0 7 12 M Methionine
2 11 25 F Phenylalanine
0 13 3 W Tryptophan
1 20 17 Y Tyrosine
2 32 44 V Valine
2 16 30 K Lycine
2 19 36 T Threonine
3 42 36 A Alanine
0 21 22 D Aspartic acid
3 26 35 E Glutamic acid
0 23 26 P Proline
2 20 43 S Serine

1ARE_1|Chain A|YEAST TRANSCRIPTION FACTOR ADR1|Saccharomyces cerevisiae (4932)
>6JAO_1|Chain A|ABC transporter, periplasmic substrate-binding protein|Thermus thermophilus (strain HB8 / ATCC 27634 / DSM 579) (300852)
>8IAK_1|Chains A, E|Long chain base biosynthesis protein 1,Serine palmitoyltransferase 1|Arabidopsis thaliana (3702)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
1ARE , Knot 18 29 0.69 28 26 27
RSFVCEVCTRAFARQEALKRHYRSHTNEK
6JAO , Knot 171 408 0.84 40 223 385
QSGPVIRVAGDSTAVGEGGRWMKEMVEAWGKKTGTRVEYIDSPADTNDRLALYQQYWAARSPDVDVYMIDVIWPGIVAPHALDLKPYLTEAELKEFFPRIVQNNTIRGKLTSLPFFTDAGILYYRKDLLEKYGYTSPPRTWNELEQMAERVMEGERRAGNRDFWGFVFQGKPYEGLTCDALEWIYSHGGGRIVEPDGTISVNNGRAALALNRAHGWVGRIAPQGVTSYAEEEARNVWQQGNSLFMRNWPYAYALGQAEGSPIRGKFGVTVLPKASADAPNAATLGGWQLMVSAYSRYPKEAVDLVKYLASYEVQKDNAVRLSRLPTRPALYTDRDVLARNPWFRDLLPVFQNAVSAPSDVAGARYNQVSEAIWTEVHSVLTGRKKGEQAVRDLEARIRRILRHHHHHH
8IAK , Knot 215 534 0.84 40 253 496
MASNLVEMFNAALNWVTMILESPSARVVLFGVPIRGHFFVEGLLGVVIIILLTRKSYKPPKRPLTEQEIDELCDEWVPEPLVDPSATDEQSWRVAKTPVTMEMPIQNHITITRNNLQEKYTNVFNLASNNFLQLSATEPVKEVVKTTIKNYGVGACGPAGFYGNQDVHYTLEYDLAQFFGTQGSVLYGQDFCAAPSVLPAFTKRGDVIVADDQVSLPVQNALQLSRSTVYYFNHNDMNSLECLLNELTEQEKLEKLPAIPRKFIVTEGIFHNSGDLAPLPELTKLKNKYKFRLFVDETFSIGVLGATGRGLSEHFNMDRATAIDITVGSMATALGSTGGFVLGDSVMCLHQRIGSNAYCFSACLPAYTVTSVSKVLKLMDSNNDAVQTLQKLSKSLHDSFASDDSLRSYVIVTSSPVSAVLHLQLTPAYRSRKFGYTCEQLFETMSALQKKSQTNKFIEPYEEEEKFLQSIVDHALINYNVLITRNTIVLKQETLPIVPSLKICCNAAMSPEELKNACESVKQSILACCQESNK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(1ARE_1)}(2) \setminus P_{f(6JAO_1)}(2)|=11\), \(|P_{f(6JAO_1)}(2) \setminus P_{f(1ARE_1)}(2)|=208\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:00110010001110001100000000000
Pair \(Z_2\) Length of longest common subsequence
1ARE_1,6JAO_1 219 2
1ARE_1,8IAK_1 241 3
6JAO_1,8IAK_1 176 4

Newick tree

 
[
	1ARE_1:12.85,
	[
		6JAO_1:88,8IAK_1:88
	]:34.85
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{437 }{\log_{20} 437}-\frac{29}{\log_{20}29})=128.\)
Status Protein1 Protein2 d d1/2
Query variables 1ARE_1 6JAO_1 162 87
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]

Graphviz Engine:
Graphviz Engine: