CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6GLA_1 3BLY_1 5MQS_1 Letter Amino acid
19 40 69 A Alanine
8 9 9 C Cysteine
18 34 68 E Glutamic acid
19 44 57 P Proline
3 45 56 T Threonine
4 10 30 W Tryptophan
16 41 53 V Valine
24 28 61 R Arginine
17 39 77 D Aspartic acid
19 41 87 G Glycine
10 26 58 I Isoleucine
39 43 75 L Leucine
7 17 36 M Methionine
4 31 60 N Asparagine
12 36 40 F Phenylalanine
29 50 68 S Serine
12 15 69 Y Tyrosine
16 29 39 Q Glutamine
7 14 32 H Histidine
11 31 64 K Lycine

6GLA_1|Chains A, B|Tyrosine-protein kinase JAK3|Homo sapiens (9606)
>3BLY_1|Chain A|Pyranose oxidase|Trametes multicolor (230624)
>5MQS_1|Chain A|Beta-L-arabinobiosidase|Bacteroides thetaiotaomicron (818)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6GLA , Knot 132 294 0.85 40 178 283
SMQDPTIFEERHLKYISQLGKGNFGSVELCRYDPLGDNTGALVAVKQLQHSGPDQQRDFQREIQILKALHSDFIVKYRGVSYGPGRQSLRLVMEYLPSGCLRDFLQRHRARLDASRLLLYSSQICKGMEYLGSRRCVHRALAARNILVESEAHVKIADFGLAKLLPLDKDYYVVREPGQSPIFWYAPESLSDNIFSRQSDVWSFGVVLYELFTYCDKSCSPSAEFLRMMGSERDVPALSRLLELLEEGQRLPAPPACPAEVHELMKLCWAPSPQDRPSFSALGPQLDMLWSGSR
3BLY , Knot 246 623 0.84 40 285 585
MSTSSSDPFFNFAKSSFRSAAAQKASASSLPPLPGPDKKVPGMDIKYDVVIVGSGPIGCTYARELVGAGYKVAMFDIGEIDSGLKIGAHKKNTVEYQKNIDKFVNVIQGQLMSVSVPVNTLVVDTLSPTSWQASTFFVRNGSNPEQDPLRNLSGQAVTRVVGGMSTHWTCATPRFDREQRPLLVKDDADADDAEWDRLYTKAESYFQTGTDQFKESIRHNLVLNKLTEEYKGQRDFQQIPLAATRRSPTFVEWSSANTVFDLQNRPNTDAPEERFNLFPAVACERVVRNALNSEIESLHIHDLISGDRFEIKADVYVLTAGAVHNTQLLVNSGFGQLGRPNPANPPELLPSLGSYITEQSLVFCQTVMSTELIDSVKSDMTIRGTPGELTYSVTYTPGASTNKHPDWWNEKVKNHMMQHQEDPLPIPFEDPEPQVTTLFQPSHPWHTQIHRDAFSYGAVQQSIDSRLIVDWRFFGRTEPKEENKLWFSDKITDAYNMPQPTFDFRFPAGRTSKEAEDMMTDMCVMSAKIGGFLPGSWPQFMKPGLVLHLGGTHRMGFDEKEDNCCVNTDSRVFGFKNLFLGGCGNIPTAYGANPTLTAMSLAIKSCEYIKQNFTPSPFTSEAQ
5MQS , Knot 405 1108 0.85 40 342 1007
MGSSHHHHHHSSGPQQGLRQAQTPQDRIHYTGKELSNPTYHDGQLSPVVGVHNIQLVRANREHPEASNGNGWTYNHQPMLAYWNGQFYYQYLADPSDEHVPPSQTFLMTSKDGYQWTNPEIVFPPYKVPDGYTKESRPGMQAKDLIAIMHQRVGFYVSKSGRLITMGNYGVALDKKDDPNDGNGIGRVVREIKKDGSFGPIYFIYYNHGFNEKNTDYPYFKKSKDREFVKACQEILDNPLYMMQWVEEADREDPIIPLKKGYKAFNCYTLPDGRIASLWKHALTSISEDGGHTWAEPVLRAKGFVNSNAKIWGQRLSDGTYATVYNPSEFRWPLAISLSKDGLEYTTLNLVHGEITPMRYGGNYKSYGPQYPRGIQEGNGVPADGDLWVSYSVNKEDMWISRIPVPVQINASAHADDDFSKSGSIAELTNWNIYSPVWAPVSLEGEWLKLQDKDPFDYAKVERKIPASKELKVSFDLSAGQNDKGILQIDFLDENSIACSRLELTPDGIFRMKGGSRFANMMNYEAGKTYHVEAVLSTADRNIQVYVDGKRVGLRMFYAPVATIERIVFRTGEMRTFPTVDTPADQTYDLPDAGGQEPLAEYRIANVKTSSTDKDASSAFLKYADFSHYAESFNGMEDENIVQAIPNAKASEWMEENIPLFECPQRNFEEMYYYRWWSLRKHIKETPVGYGMTEFLVQRSYSDKYNLIACAIGHHIYESRWLRDPKYLDQIIHTWYRGNDGGPMKKMDKFSSWNADAVLARYMVDGDKDFMLDMTKDLETEYQRWERTNRLKNGLYWQGDVQDGMEESISGGRNKKYARPTINSYMYGNAKALSIMGILSGDEGMAMRYGMRADTLKSLVENDLWNTRHQFFETMRTDSSANVREAIGYIPWYFNLPDTTKKYEVAWKEIMDEKGFSAPYGLTTAERRHPEFRTRGVGKCEWDGAIWPFASAQTLTAMANFMNNYPQTVLSDSVYFRQMELYVESQYHRGRPYIGEYLDEVTGYWLKGDQERSRYYNHSTFNDLMITGLIGLRPRLDDTIEINPLIPADKWDWFCLDNVLYHGHNLTILWDKNGDRYHCGKGLRIFVNGKEAGHADTLTRLVCENALK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6GLA_1)}(2) \setminus P_{f(3BLY_1)}(2)|=41\), \(|P_{f(3BLY_1)}(2) \setminus P_{f(6GLA_1)}(2)|=148\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:010010110000100100110101101010000111000111111001000110000010001011011000111000110011100010111001101010011000010101001110000100110011000010011110011100010101101111011110000011001100111101100100011000001101111100110000000010101101110000111100110110010011111101101001101011101000101011110101110100
Pair \(Z_2\) Length of longest common subsequence
6GLA_1,3BLY_1 189 5
6GLA_1,5MQS_1 214 4
3BLY_1,5MQS_1 123 5

Newick tree

 
[
	6GLA_1:11.01,
	[
		3BLY_1:61.5,5MQS_1:61.5
	]:49.51
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{917 }{\log_{20} 917}-\frac{294}{\log_{20}294})=168.\)
Status Protein1 Protein2 d d1/2
Query variables 6GLA_1 3BLY_1 218 158
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]