CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9BAA_1 4HSF_1 5FSA_1 Letter Amino acid
19 3 14 Q Glutamine
22 2 13 M Methionine
25 3 25 Y Tyrosine
41 6 30 V Valine
37 2 30 E Glutamic acid
53 12 30 G Glycine
21 1 17 H Histidine
87 8 41 L Leucine
37 3 31 F Phenylalanine
36 14 17 N Asparagine
42 7 29 D Aspartic acid
62 6 35 K Lycine
13 2 29 P Proline
51 10 32 S Serine
37 7 32 T Threonine
3 6 7 W Tryptophan
53 12 24 A Alanine
23 11 23 R Arginine
8 8 4 C Cysteine
80 6 27 I Isoleucine

9BAA_1|Chains A, B|ABC-type bacteriocin transporter|Acetivibrio thermocellus ATCC 27405 (203119)
>4HSF_1|Chain A|Lysozyme C|Gallus gallus (9031)
>5FSA_1|Chains A, B|CYP51 VARIANT1|CANDIDA ALBICANS (5476)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9BAA , Knot 285 750 0.83 40 275 659
MGHHHHHHHHHHSSGHIDDDDKHMLRRLFKKKYVCVRQYDLTDCGAACLSSIAQYYGLKMSLAKIREMTGTDTQGTNAYGLIHAAKQLGFSAKGVKASKEDLLKDFRLPAIANVIVDNRLAHFVVIYSIKNRIITVADPGKGIVRYSMDDFCSIWTGGLVLLEPGEAFQKGDYTQNMMVKFAGFLKPLKKTVLCIFLASLLYTALGIAGSFYIKFLFDDLIKFEKLNDLHIISAGFAVIFLLQIFLNYYRSILVTKLGMSIDKSIMMEYYSHVLKLPMNFFNSRKVGEIISRFMDASKIRQAISGATLTIMIDTIMAVIGGILLYIQNSSLFFISFIIILLYGIIVTVFNKPIQNANRQIMEDNAKLTSALVESVKGIETIKSFGAEEQTEKSTRDKIETVMKSSFKEGMLYINLSSLTGIVAGLGGIVILWAGAYNVIKGNMSGGQLLAFNALLAYFLTPVKNLIDLQPLIQTAVVASNRLGEILELATEKELREDSDDFVISLKGDIEFRNVDFRYGLRKPVLKNINLTIPKGKTVAIVGESGSGKTTLAKLLMNFYSPEKGDILINGHSIKNISLELIRKKIAFVSQDVFIFSGTVKENLCLGNENVDMDEIIKAAKMANAHDFIEKLPLKYDTFLNESGANLSEGQKQRLAIARALLKKPDILILDEATSNLDSITENHIKDAIYGLEDDVTVIIIAHRLSTIVNCDKIYLLKDGEIVESGSHTELIALKGCYFKMWKQTENTLAS
4HSF , Knot 66 129 0.82 40 104 127
KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRL
5FSA , Knot 208 490 0.87 40 259 472
MAKKTPPLVFYWIPWFGSAASYGQQPYEFFESCRQKYGDVFSFMLLGKIMTVYLGPKGHEFVFNAKLSDVSAEEAYKHLTTPVFGTGVIYDCPNSRLMEQKKFAKFALTTDSFKRYVPKIREEILNYFVTDESFKLKEKTHGVANVMKTQPEITIFTASRSLFGDEMRRIFDRSFAQLYSDLDKGFTPINFVFPNLPLPHYWRRDAAQKKISATYMKEIKLRRERGDIDPNRDLIDSLLIHSTYKDGVKMTDQEIANLLIGILMGGQHTSASTSAWFLLHLGEKPHLQDVIYQEVVELLKEKGGDLNDLTYEDLQKLPSVNNTIKETLRMHMPLHSIFRKVTNPLRIPETNYIVPKGHYVLVSPGYAHTSERYFDNPEDFDPTRWDTAAAKANSVSFNSSDEVDYGFGKVSKGVSSPYLPFGGGRHRCIGEQFAYVQLGTILTTFVYNLRWTIDGYKVPDPDYSSMVVLPTEPAEIIWEKRETCMFHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9BAA_1)}(2) \setminus P_{f(4HSF_1)}(2)|=195\), \(|P_{f(4HSF_1)}(2) \setminus P_{f(9BAA_1)}(2)|=24\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110000000000001010000001100110000101000010001110100110001101011010010100001001011101100111010110100001100101111101110001101111001000110110110111000100100110111111011011001000001110111110110001101111011001111110101011100110100100101101111111110111000001110011101000111000001101110110000110110011010010011011010111001111111111010000111101111110111101100110010001100010100111001011001001110000000000010011000100111010100101111111111111111001101010110111101111011011001101011100111100011011011000010000001110101010100101001100111001010110100111110010100011011101001001011101001001010110001111000111101010001011000101001101101101001100111000011000110100100001111011100101111001000100100001001101100010111110010011000010110010110010000111101001011000000110
Pair \(Z_2\) Length of longest common subsequence
9BAA_1,4HSF_1 219 3
9BAA_1,5FSA_1 138 4
4HSF_1,5FSA_1 211 3

Newick tree

 
[
	4HSF_1:11.58,
	[
		9BAA_1:69,5FSA_1:69
	]:48.58
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{879 }{\log_{20} 879}-\frac{129}{\log_{20}129})=210.\)
Status Protein1 Protein2 d d1/2
Query variables 9BAA_1 4HSF_1 264 154.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]