CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8XVV_1 3NQH_1 1CCT_1 Letter Amino acid
54 13 1 M Methionine
333 23 17 P Proline
189 32 11 T Threonine
336 7 11 Q Glutamine
175 41 22 G Glycine
62 9 12 H Histidine
148 35 24 K Lycine
69 20 12 F Phenylalanine
237 34 18 S Serine
51 25 8 Y Tyrosine
164 17 7 R Arginine
176 25 14 E Glutamic acid
116 20 9 I Isoleucine
283 26 26 L Leucine
314 19 13 A Alanine
89 29 19 D Aspartic acid
25 8 1 C Cysteine
23 11 7 W Tryptophan
207 31 17 V Valine
72 16 10 N Asparagine

8XVV_1|Chain A[auth H]|Isoform 2 of E1A-binding protein p400|Homo sapiens (9606)
>3NQH_1|Chain A|Glycosyl hydrolase|Bacteroides thetaiotaomicron (226186)
>1CCT_1|Chain A|CARBONIC ANHYDRASE II|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8XVV , Knot 957 3123 0.82 40 365 1950
MHHGTGPQNVQHQLQRSRACPGSEGEEQPAHPNPPPSPAAPFAPSASPSAPQSPSYQIQQLMNRSPATGQNVNITLQSVGPVVGGNQQITLAPLPLPSPTSPGFQFSAQPRRFEHGSPSYIQVTSPLSQQVQTQSPTQPSPGPGQALQNVRAGAPGPGLGLCSSSPTGGFVDASVLVRQISLSPSSGGHFVFQDGSGLTQIAQGAQVQLQHPGTPITVRERRPSQPHTQSGGTIHHLGPQSPAAAGGAGLQPLASPSHITTANLPPQISSIIQGQLVQQQQVLQGPPLPRPLGFERTPGVLLPGAGGAAGFGMTSPPPPTSPSRTAVPPGLSSLPLTSVGNTGMKKVPKKLEEIPPASPEMAQMRKQCLDYHYQEMQALKEVFKEYLIELFFLQHFQGNMMDFLAFKKKHYAPLQAYLRQNDLDIEEEEEEEEEEEEKSEVINDEQQALAGSLVAGAGSTVETDLFKRQQAMPSTGMAEQSKRPRLEVGHQGVVFQHPGADAGVPLQQLMPTAQGGMPPTPQAAQLAGQRQSQQQYDPSTGPPVQNAASLHTPLPQLPGRLPPAGVPTAALSSALQFAQQPQVVEAQTQLQIPVKTQQPNVPIPAPPSSQLPIPPSQPAQLALHVPTPGKVQVQASQLSSLPQMVASTRLPVDPAPPCPRPLPTSSTSSLAPVSGSGPGPSPARSSPVNRPSSATNKALSPVTSRTPGVVASAPTKPQSPAQNATSSQDSSQDTLTEQITLENQVHQRIAELRKAGLWSQRRLPKLQEAPRPKSHWDYLLEEMQWMATDFAQERRWKVAAAKKLVRTVVRHHEEKQLREERGKKEEQSRLRRIAASTAREIECFWSNIEQVVEIKLRVELEEKRKKALNLQKVSRRGKELRPKGFDALQESSLDSGMSGRKRKASISLTDDEVDDEEETIEEEEANEGVVDHQTELSNLAKEAELPLLDLMKLYEGAFLPSSQWPRPKPDGEDTSGEEDADDCPGDRESRKDLVLIDSLFIMDQFKAAERMNIGKPNAKDIADVTAVAEAILPKGSARVTTSVKFNAPSLLYGALRDYQKIGLDWLAKLYRKNLNGILADEAGLGKTVQIIAFFAHLACNEGNWGPHLVVVRSCNILKWELELKRWCPGLKILSYIGSHRELKAKRQEWAEPNSFHVCITSYTQFFRGLTAFTRVRWKCLVIDEMQRVKGMTERHWEAVFTLQSQQRLLLIDSPLHNTFLELWTMVHFLVPGISRPYLSSPLRAPSEESQDYYHKVVIRLHRVTQPFILRRTKRDVEKQLTKKYEHVLKCRLSNRQKALYEDVILQPGTQEALKSGHFVNVLSILVRLQRICNHPGLVEPRHPGSSYVAGPLEYPSASLILKALERDFWKEADLSMFDLIGLENKITRHEAELLSKKKIPRKLMEEISTSAAPAARPAAAKLKASRLFQPVQYGQKPEGRTVAFPSTHPPRTAAPTTASAAPQGPLRGRPPIATFSANPEAKAAAAPFQTSQASASAPRHQPASASSTAASPAHPAKLRAQTTAQASTPGQPPPQPQAPSHAAGQSALPQRLVLPSQAQARLPSGEVVKIAQLASITGPQSRVAQPETPVTLQFQGSKFTLSHSQLRQLTAGQPLQLQGSVLQIVSAPGQPYLRAPGPVVMQTVSQAGAVHGALGSKPPAGGPSPAPLTPQVGVPGRVAVNALAVGEPGTASKPASPIGGPTQEEKTRLLKERLDQIYLVNERRCSQAPVYGRDLLRICALPSHGRVQWRGSLDGRRGKEAGPAHSYTSSSESPSELMLTLCRCGESLQDVIDRVAFVIPPVVAAPPSLRVPRPPPLYSHRMRILRQGLREHAAPYFQQLRQTTAPRLLQFPELRLVQFDSGKLEALAILLQKLKSEGRRVLILSQMILMLDILEMFLNFHYLTYVRIDENASSEQRQELMRSFNRDRRIFCAILSTHSRTTGINLVEADTVVFYDNDLNPVMDAKAQEWCDRIGRCKDIHIYRLVSGNSIEEKLLKNGTKDLIREVAAQGNDYSMAFLTQRTIQELFEVYSPMDDAGFPVKAEEFVVLSQEPSVTETIAPKIARPFIEALKSIEYLEEDAQKSAQEGVLGPHTDALSSDSENMPCDEEPSQLEELADFMEQLTPIEKYALNYLELFHTSIEQEKERNSEDAVMTAVRAWEFWNLKTLQEREARLRLEQEEAELLTYTREDAYSMEYVYEDVDGQTEVMPLWTPPTPPQDDSDIYLDSVMCLMYEATPIPEAKLPPVYVRKERKRHKTDPSAAGRKKKQRHGEAVVPPRSLFDRATPGLLKIRREGKEQKKNILLKQQVPFAKPLPTFAKPTAEPGQDNPEWLISEDWALLQAVKQLLELPLNLTIVSPAHTPNWDLVSDVVNSCSRIYRSSKQCRNRYENVIIPREEGKSKNNRPLRTSQIYAQDENATHTQLYTSHFDLMKMTAGKRSPPIKPLLGMNPFQKNPKHASVLAESGINYDKPLPPIQVASLRAERIAKEKKALADQQKAQQPAVAQPPPPQPQPPPPPQQPPPPLPQPQAAGSQPPAGPPAVQPQPQPQPQTQPQPVQAPAKAQPAITTGGSAAVLAGTIKTSVTGTSMPTGAVSGNVIVNTIAGVPAATFQSINKRLASPVAPGALTTPGGSAPAQVVHTQPPPRAVGSPATATPDLVSMATTQGVRAVTSVTASAVVTTNLTPVQTPARSLVPQVSQATGVQLPGKTITPAHFQLLRQQQQQQQQQQQQQQQQQQQQQQQQQQQQQTTTTSQVQVPQIQGQAQSPAQIKAVGKLTPEHLIKMQKQKLQMPPQPPPPQAQSAPPQPTAQVQVQTSQPPQQQSPQLTTVTAPRPGALLTGTTVANLQVARLTRVPTSQLQAQGQMQTQAPQPAQVALAKPPVVSVPAAVVSSPGVTTLPMNVAGISVAIGQPQKAAGQTVVAQPVHMQQLLKLKQQAVQQQKAIQPQAAQGPAAVQQKITAQQITTPGAQQKVAYAAQPALKTQFLTTPISQAQKLAGAQQVQTQIQVAKLPQVVQQQTPVASIQQVASASQQASPQTVALTQATAAGQQVQMIPAVTATAQVVQQKLIQQQVVTTASAPLQTPGAPNPAQVPASSDSPSQQPKLQMRVPAVRLKTPTKPPCQ
3NQH , Knot 188 441 0.86 40 246 420
GRKTEKVVNNGIPWFDDRGEIVNAHGACIVEENGRYYLFGEYKSDKSNAFPGFSCYSSDDLVNWKFERVVLPMQSSGILGPDRVGERVKVMKCPSTGEYVMYMHADDMNYKDPHIGYATCSTIAGEYKLHGPLLYEGKPIRRWDMGTYQDTDGTGYLLLHGGIVYRLSKDYRTAEEKVVSGVGGSHGESPAMFKKDGTYFFLFSNLTSWEKNDNFYFTAPSVKGPWTRQGLFAPEGSLTYNSQTTFVFPLKCGEDTIPMFMGDRWSYPHQASAATYVWMPMQVDGTKLSIPEYWPSWDVDKLKPVNPLRKGKTVDLKKITFSKEADWKVEEGRISSNVKGSTLSIPFTGSCVAVMGETNCHSGYARMNILDKKGEKIYSSLVDFYSKANDHATRFKTPQLAEGEYTLVIEVTGISPTWTDKTKRIYGSDDCFVTITDIVKL
1CCT , Knot 111 259 0.79 40 174 248
SHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRILNNGHAFNVEFDDSQDKAVLKGGPLDGTYRLIQFHFHWGSLDGQGSEHTVDKKKYAAELHLVHWNTKYGDFGKAVQQPDGLAVLGIFLKVGSAKPGLQKVVDVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSLETPPLLECVTWIVLKEPISVSSEQVLKFRKLNFNGEGEPEELMVDNWRPAQPLKNRQIKASFK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8XVV_1)}(2) \setminus P_{f(3NQH_1)}(2)|=136\), \(|P_{f(3NQH_1)}(2) \setminus P_{f(8XVV_1)}(2)|=17\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:100101100100010000101100100011010111011111110101011001000100110001101001010100111111100010111111101001110101010010010100101001100010000100101111011001011111111110000101111010111001010100110111001011001101101010011011010000100100001101001110011111111101110100100101110100110101100001101111101111000111111111111111100111100100011111100111001100110011001001111010110100001000000101100110001101111001010110111100000111010100001010000000000000001100000111101111110010001100001110011100000101011001111001110111110011101011111010110111000000000100111100110100111011101111111011100110110010110100010111000010111111100011111001101110110110101010010011011100011101111010111000000111101011110110001100100100011011000011111011001001100100000000001000101000100011010011110000110100110100010011001011100110000101111001100110000000100001000000010011100100100110010011010101010000001101001000100101011011000010011010000101010000100000010000100111000001001100101111011010011111000110101010000100010001100000001111001111001011001011010100110101110111101010100010101101101110000011101110100001011110011110010111111011000101110111100001101010100101110110011000010100001101001010100000110110110010100111001001011000010111010000011110011000110110110111111001010011011000000000011101001001111000000100010000001100010000011000111011000110010110110111010010001111010011000111110010101110110001100101011011110001000010110000110011001000111110111101010011011001001010011110001100111001011101110101111010101010111111000010101100011010001101101101010001010011011101011001110011100111100101011010110110110101100011010011010101001010000100101101101010110110111010101111111001001111011110011111101111010111110111011111011010011011111000000011000100101100000001110100110101110010101010101001001111000000000100111010001001001100111111111111101011011110000101100110001110100100001101101101011010010101111110010001001111001111101101110100100101000100000001100100000110111000000011011010011100001011101010010001100001010011010010001100100011001110100001111000010011010011001111101001111000101000111011011101100100100010001001111100011000000110000100100110110010110001100101100010000000000111011011011010010000101010000101100000010010010001010001111101101100000101001101100101110101111010000000000101110000000101111100110010111101000100000011100011110111011010101100010111000111101100110111010110110010101100110000010000000000000111100010000001100001010000100001000010110101100011101111101100010010111001100001111101101010011000011100001001111011110101111100111111010111001111111101010101000101101110101110011011111101000101001101110101110011111110100100011011111110011101110110001110111011010101101100011011001010111000101100110011101001011011100101101011000000000000000000000000000000000000101101010100110101110101001101000010111011110100111010101010000110000101001011011111010011010110100110001010101000110110111101111011111100111001110111101111010011100111011010011010001100001101011011111000101001001110001101101110001100110010011110010001011011011000011101001101000101001110010111001011111010101100011000110010111001111011011100001000101010111101001001100
Pair \(Z_2\) Length of longest common subsequence
8XVV_1,3NQH_1 153 4
8XVV_1,1CCT_1 203 5
3NQH_1,1CCT_1 182 4

Newick tree

 
[
	1CCT_1:10.16,
	[
		8XVV_1:76.5,3NQH_1:76.5
	]:25.66
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{3564 }{\log_{20} 3564}-\frac{441}{\log_{20}441})=740.\)
Status Protein1 Protein2 d d1/2
Query variables 8XVV_1 3NQH_1 911 522.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]