CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8TXZ_1 7UKV_1 1URA_1 Letter Amino acid
60 20 64 A Alanine
64 16 13 R Arginine
57 21 27 D Aspartic acid
47 12 22 Q Glutamine
85 26 23 E Glutamic acid
66 20 45 G Glycine
31 12 8 M Methionine
41 8 8 F Phenylalanine
76 17 22 S Serine
56 13 39 T Threonine
34 13 11 Y Tyrosine
80 25 16 I Isoleucine
79 23 23 V Valine
53 8 21 N Asparagine
31 8 10 H Histidine
86 23 28 K Lycine
51 18 20 P Proline
16 6 3 W Tryptophan
27 6 4 C Cysteine
154 33 39 L Leucine

8TXZ_1|Chain A|Leucine-rich repeat serine/threonine-protein kinase 2|Homo sapiens (9606)
>7UKV_1|Chain A|Epidermal growth factor receptor|Homo sapiens (9606)
>1URA_1|Chains A, B|ALKALINE PHOSPHATASE|Escherichia coli (562)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8TXZ , Knot 434 1194 0.85 40 339 1042
RMKLMIVGNTGSGKTTLLQQLMKTKKSDLGMQSATVGIDVKDWPIQIRDKRKRDLVLNVWDFAGREEFYSTHPHFMTQRALYLAVYDLSKGQAEVDAMKPWLFNIKARASSSPVILVGTHLDVSDEKQRKACMSKITKELLNKRGFPAIRDYHFVNATEESDALAKLRKTIINESLNFKIRDQLVVGQLIPDCYVELEKIILSERKNVPIEFPVIDRKRLLQLVRENQLQLDENELPHAVHFLNESGVLLHFQDPALQLSDLYFVEPKWLCKIMAQILTVKVEGCPKHPKGIISRRDVEKFLSKKRKFPKNYMSQYFKLLEKFQIALPIGEEYLLVPSSLSDHRPVIELPHCENSEIIIRLYEMPYFPMGFWSRLINRLLEISPYMLSGRERALRPNRMYWRQGIYLNWSPEAYCLVGSEVLDNHPESFLKITVPSCRKGCILLGQVVDHIDSLMEEWFPGLLEIDICGEGETLLKKWALYSFNDGEEHQKILLDDLMKKAEEGDLLVNPDQPRLTIPISQIAPDLILADLPRNIMLNNDELEFEQAPEFLLGDGSFGSVYRAAYEGEEVAVKIFNKHTSLRLLRQELVVLCHLHHPSLISLLAAGIRPRMLVMELASKGSLDRLLQQDKASLTRTLQHRIALHVADGLRYLHSAMIIYRDLKPHNVLLFTLYPNAAIIAKIADYGIAQYCCRMGIKTSEGTPGFRAPEVARGNVIYNQQADVYSFGLLLYDILTTGGRIVEGLKFPNEFDELEIQGKLPDPVKEYGCAPWPMVEKLIKQCLKENPQERPTSAQVFDILNSAELVCLTRRILLPKNVIVECMVATHHNSRNASIWLGCGHTDRGQLSFLDLNTEGYTSEEVADSRILCLALVHLPVEKESWIVSGTQSGTLLVINTEDGKKRHTLEKMTDSVTCLYCNSFSKQSKQKNFLLVGTADGKLAIFEDKTVKLKGAAPLKILNIGNVSTPLMCLSESTNSTERNVMWGGCGTKIFSFSNDFTIQKLIETRTSQLFSYAAFSDSNIITVVVDTALYIAKQNSPVVEVWDKKTEKLCGLIDCVHFLREVMVKENKESKHKMSYSGRVKTLCLQKNTALWIGTGGGHILLLDLSTRRLIRVIYNFCNSVRVMMTAQLGSLKNVMLVLGYNRKNTEGTQKQKEIQSCLTVWDINLPHEVQNLEKHIEVRKELAEKMRRTSVE
7UKV , Knot 145 328 0.85 40 210 322
SGEAPNQALLRILKETEFKKIKVLGSGAFGTVYKGLWIPEGEKVKIPVAIKELREATSPKANKEILDEAYVMASVDNPHVCRLLGICLTSTVQLITQLMPFGCLLDYVREHKDNIGSQYLLNWCVQIAKGMNYLEDRRLVHRDLAARNVLVKTPQHVKITDFGLAKLLGAEEKEYHAEGGKVPIKWMALESILHRIYTHQSDVWSYGVTVWELMTFGSKPYDGIPASEISSILEKGERLPQPPICTIDVYMIMVKCWMIDADSRPKFRELIIEFSKMARDPQRYLVIQGDERMHLPSPTDSNFYRALMDEEDMDDVVDADEYLIPQQG
1URA , Knot 183 446 0.83 40 215 416
MPVLENRAAQGDITAPGGARRLTGDQTAALRDSLSDKPAKNIILLIGNGMGDSEITAARNYAEGAGGFFKGIDALPLTGQYTHYALNKKTGKPDYVTDSAASATAWSTGVKTYNGALGVDIHEKDHPTILEMAKAAGLATGNVSTAELQDATPAALVAHVTSRKCYGPSATSEKCPGNALEKGGKGSITEQLLNARADVTLGGGAKTFAETATAGEWQGKTLREQAQARGYQLVSDAASLNSVTEANQQKPLLGLFADGNMPVRWLGPKATYHGNIDKPAVTCTPNPQRNDSVPTLAQMTDKAIELLSKNEKGFFLQVEGASIDKQDHAANPCGQIGETVDLDEAVQRALEFAKKEGNTLVIVTADHAHASQIVAPDTKAPGLTQALNTKDGAVMVMSYGNSEEDSQEHTGSQLRIAAYGPHAANVVGLTDQTDLFYTMKAALGLK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8TXZ_1)}(2) \setminus P_{f(7UKV_1)}(2)|=144\), \(|P_{f(7UKV_1)}(2) \setminus P_{f(8TXZ_1)}(2)|=15\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:010111110010100011001100000011100101110100111010000000111011011100010000101100011011100100101010110111101010100011111100101000000010100100011000111110000110100000111010001100010101000111101110001010011100000111011110000110110000101000011011011000111101001110100101101011001110110101010100101110000100110000011000100010110010111111000111100100001110110000001110100110111111001100110101011010001101001010011010101010011100110001001101011000010111101100100110011111101010101001100111001001000001110011001001011101001010111001110111101100111000010100110111101011010011001001110110000010110001111001001011011111101011110110010100110000101000100011101101100100111100010100111101010111110110011100000111000010111011011010110000101001111100110011011011011001001010101101100010111111001100010001000100101101100101101000111100111001110000000101111010000101011010001000001100011011110111000011101000101111000010000010010001001000010000000011111010101111000010101111101101101001110100000000001111101001101000101001100000011001110000110111001101100001110110000001011100101100111000000000100010100101000011111011101111010000110110010001011101011010011111100000001000000100010110101100100100010100011001000010
Pair \(Z_2\) Length of longest common subsequence
8TXZ_1,7UKV_1 159 4
8TXZ_1,1URA_1 166 5
7UKV_1,1URA_1 151 4

Newick tree

 
[
	8TXZ_1:83.10,
	[
		7UKV_1:75.5,1URA_1:75.5
	]:7.60
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1522 }{\log_{20} 1522}-\frac{328}{\log_{20}328})=307.\)
Status Protein1 Protein2 d d1/2
Query variables 8TXZ_1 7UKV_1 386 244.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]