CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8SCX_1 1GON_1 2DSF_1 Letter Amino acid
5 12 19 N Asparagine
4 12 13 Q Glutamine
1 19 10 Y Tyrosine
5 4 1 M Methionine
10 22 21 S Serine
10 36 12 R Arginine
3 18 6 H Histidine
5 8 27 K Lycine
9 11 5 I Isoleucine
9 15 11 F Phenylalanine
11 31 12 P Proline
5 14 6 W Tryptophan
10 28 26 V Valine
5 39 21 D Aspartic acid
4 2 20 C Cysteine
4 21 20 E Glutamic acid
4 30 23 T Threonine
21 69 34 A Alanine
23 43 25 G Glycine
10 45 33 L Leucine

8SCX_1|Chain A|Mitochondrial import inner membrane translocase subunit TIM17|Saccharomyces cerevisiae (4932)
>1GON_1|Chains A, B|BETA-GLUCOSIDASE|STREPTOMYCES SP. (1931)
>2DSF_1|Chain A|Lactotransferrin|Bos taurus (9913)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8SCX , Knot 78 158 0.83 40 118 152
MSADHSRDPCPIVILNDFGGAFAMGAIGGVVWHGIKGFRNSPLGERGSGAMSAIKARAPVLGGNFGVWGGLFSTFDCAVKAVRKREDPWNAIIAGFFTGGALAVRGGWRHTRNSSITCACLLGVIEGVGLMFQRYAAWQAKPMAPPLPEAPSSQPLQA
1GON , Knot 187 479 0.80 40 223 439
MVPAAQQTATAPDAALTFPEGFLWGSATASYQIEGAAAEDGRTPSIWDTYARTPGRVRNGDTGDVATDHYHRWREDVALMAELGLGAYRFSLAWPRIQPTGRGPALQKGLDFYRRLADELLAKGIQPVATLYHWDLPQELENAGGWPERATAERFAEYAAIAADALGDRVKTWTTLNEPWCSAFLGYGSGVHAPGRTDPVAALRAAHHLNLGHGLAVQALRDRLPADAQCSVTLNIHHVRPLTDSDADADAVRRIDALANRVFTGPMLQGAYPEDLVKDTAGLTDWSFVRDGDLRLAHQKLDFLGVNYYSPTLVSEADGSGTHNSDGHGRSAHSPWPGADRVAFHQPPGETTAMGWAVDPSGLYELLRRLSSDFPALPLVITENGAAFHDYADPEGNVNDPERIAYVRDHLAAVHRAIKDGSDVRGYFLWSLLDNFEWAHGYSKRFGAVYVDYPTGTRIPKASARWYAEVARTGVLPTA
2DSF , Knot 146 345 0.82 40 199 326
YTRVVWCAVGPEEQKKCQQWSQQSGQNVTCATASTTDDCIVLVLKGEADALNLDGGYIYTAGKCGLVPVLAENRKSSKHSSLDCVLRPTEGYLAVAVVKKANEGLTWNSLKDKKSCHTAVDRTAGWNIPMGLIVNQTGSCAFDEFFSQSCAPGADPKSRLCALCAGDDQGLDKCVPNSKEKYYGYTGAFRCLAEDVGDVAFVKNDTVWENTNGESTADWAKNLKREDFRLLCLDGTRKPVTEAQSCHLAVAPNHAVVSRSDRAAHVEQVLLHQQALFGKNGKNCPDKFCLFKSETKNLLFNDNTECLAKLGGRPTYEEYLGTEYVTAIANLKKCSTSPLLEACAF

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8SCX_1)}(2) \setminus P_{f(1GON_1)}(2)|=34\), \(|P_{f(1GON_1)}(2) \setminus P_{f(8SCX_1)}(2)|=139\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:10100000101111100111111111111111011011000111001011101101011111101111111100100110110000011011111110111111011100000001001011111011111100011101011111110110001101
Pair \(Z_2\) Length of longest common subsequence
8SCX_1,1GON_1 173 3
8SCX_1,2DSF_1 179 4
1GON_1,2DSF_1 170 4

Newick tree

 
[
	8SCX_1:88.99,
	[
		1GON_1:85,2DSF_1:85
	]:3.99
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{637 }{\log_{20} 637}-\frac{158}{\log_{20}158})=137.\)
Status Protein1 Protein2 d d1/2
Query variables 8SCX_1 1GON_1 167 111
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]