CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
9FTI_1 4GQS_1 8TKH_1 Letter Amino acid
33 18 152 A Alanine
32 15 127 Q Glutamine
50 28 134 G Glycine
17 32 166 K Lycine
39 32 114 F Phenylalanine
37 28 133 T Threonine
20 26 134 N Asparagine
8 14 78 H Histidine
8 2 22 W Tryptophan
38 33 199 E Glutamic acid
41 35 148 I Isoleucine
5 13 84 M Methionine
28 29 88 P Proline
41 24 179 S Serine
28 26 137 R Arginine
39 23 146 D Aspartic acid
6 11 51 C Cysteine
64 49 318 L Leucine
21 11 76 Y Tyrosine
54 28 185 V Valine

9FTI_1|Chains A, B, C, D, E|Neurotransmitter-gated ion-channel ligand-binding domain-containing protein|Desulfofustis sp. PB-SRB1 (1385624)
>4GQS_1|Chains A, B, C, D|Cytochrome P450 2C19|Homo sapiens (9606)
>8TKH_1|Chains A, B, C, D|Inositol 1,4,5-trisphosphate receptor type 3|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
9FTI , Knot 238 609 0.83 40 254 559
TEGRVQHFTGYIEDGRGIFYSLPDMKQGDIIYASMQNTGGNLDPLVGIMAEEIDPAVSLGQVLEKALASENDLISELTAVADRIFLGWDDDGGKGYSASLEFTIPRDGTYHIFAGSTITNQRLDKFQPTYTTGSFQLILGLNAPQVISGEGEPEGEVFASLASLEIKPEAHVQELEIRLDKDTRYLTQHTRNLQPGDTFHALVEPIGEAPLPRLRLTDSGGKPLAFGLIDQPGESVELNYTCDQDICELVVHVDGTDGQKDSGEAVYRLLVGINAPNLRESGQTPVGSSVFLESDLVTVGLAVDQIVGVDQRSENFSVVGTLKLSWHDPKLGFSPDQCGCTVKSFEDASIRAVAGEINLPLPSFSFYNQQGNRWSQNQVIFVTPDGRASYFERFTVTLQAPDFDFLAYPFDRQKFSIKVDLAVPTNMFIFNEIERFQQVVGDQLGEEEWVVTSYSQEITEVPFERGSTNSRFTTTLLVKRNLEYYILRIFVPLFLIISVSWVIFFLKDYGRQLEVASGNLLVFVAFNFTISGDLPRLGYLTVLDRFMIVSFCLTAIVVLISVCQKRLGAVGKQAVAAQIDTWVLVIYPLVYSLYIIWVYLRFFTDHIGW
4GQS , Knot 196 477 0.84 40 249 448
MAKKTSSGRGKLPPGPTPLPVIGNILQIDIKDVSKSLTNLSKIYGPVFTLYFGLERMVVLHGYEVVKEALIDLGEEFSGRGHFPLAERANRGFGIVFSNGKRWKEIRRFSLMTLRNFGMGKRSIEDRVQEEARCLVEELRKTKASPCDPTFILGCAPCNVICSIIFQKRFDYKDQQFLNLMEKLNENIRIVSTPWIQICNNFPTIIDYFPGTHNKLLKNLAFMESDILEKVKEHQESMDINNPRDFIDCFLIKMEKEKQNQQSEFTIENLVITAADLLGAGTETTSTTLRYALLLLLKHPEVTAKVQEEIERVVGRNRSPCMQDRGHMPYTDAVVHEVQRYIDLIPTSLPHAVTCDVKFRNYLIPKGTTILTSLTSVLHDNKEFPNPEMFDPRHFLDEGGNFKKSNYFMPFSAGKRICVGEGLARMELFLFLTFILQNFNLKSLIDPKDLDTTPVVNGFASVPPFYQLCFIPIHHHH
8TKH , Knot 871 2671 0.85 40 384 2055
MSEMSSFLHIGDIVSLYAEGSVNGFISTLGLVDDRCVVEPAAGDLDNPPKKFRDCLFKVCPMNRYSAQKQYWKAKQTKQDKEKIADVVLLQKLQHAAQMEQKQNDTENKKVHGDVVKYGSVIQLLHMKSNKYLTVNKRLPALLEKNAMRVTLDATGNEGSWLFIQPFWKLRSNGDNVVVGDKVILNPVNAGQPLHASNYELSDNAGCKEVNSVNCNTSWKINLFMQFRDHLEEVLKGGDVVRLFHAEQEKFLTCDEYKGKLQVFLRTTLRQSATSATSSNALWEVEVVHHDPCRGGAGHWNGLYRFKHLATGNYLAAEENPSYKGDASDPKAAGMGAQGRTGRRNAGEKIKYCLVAVPHGNDIASLFELDPTTLQKTDSFVPRNSYVRLRHLCTNTWIQSTNVPIDIEEERPIRLMLGTCPTKEDKEAFAIVSVPVSEIRDLDFANDASSMLASAVEKLNEGFISQNDRRFVIQLLEDLVFFVSDVPNNGQNVLDIMVTKPNRERQKLMREQNILKQVFGILKAPFREKGGEGPLVRLEELSDQKNAPYQHMFRLCYRVLRHSQEDYRKNQEHIAKQFGMMQSQIGYDILAEDTITALLHNNRKLLEKHITKTEVETFVSLVRKNREPRFLDYLSDLCVSNHIAIPVTQELICKCVLDPKNSDILIRTELRPVKEMAQSHEYLSIEYSEEEVWLTWTDKNNEHHEKSVRQLAQEARAGNAHDENVLSYYRYQLKLFARMCLDRQYLAIDEISQQLGVDLIFLCMADEMLPFDLRASFCHLMLHVHVDRDPQELVTPVKFARLWTEIPTAITIKDYDSNLNASRDDKKNKFANTMEFVEDYLNNVVSEAVPFANEEKNKLTFEVVSLAHNLIYFGFYSFSELLRLTRTLLGIIDCVQGPPAMLQAYEDPGGKNVRRSIQGVGHMMSTMVLSRKQSVFSAPSLSAGASAAEPLDRSKFEENEDIVVMETKLKILEILQFILNVRLDYRISYLLSVFKKEFVEVFPMQDSGADGTAPAFDSTTANMNLDRIGEQAEAMFGVGKTSSMLEVDDEGGRMFLRVLIHLTMHDYAPLVSGALQLLFKHFSQRQEAMHTFKQVQLLISAQDVENYKVIKSELDRLRTMVEKSELWVDKKGSGKGEEVEAGAAKDKKERPTDEEGFLHPPGEKSSENYQIVKGILERLNKMCGVGEQMRKKQQRLLKNMDAHKVMLDLLQIPYDKGDAKMMEILRYTHQFLQKFCAGNPGNQALLHKHLHLFLTPGLLEAETMQHIFLNNYQLCSEISEPVLQHFVHLLATHGRHVQYLDFLHTVIKAEGKYVKKCQDMIMTELTNAGDDVVVFYNDKASLAHLLDMMKAARDGVEDHSPLMYHISLVDLLAACAEGKNVYTEIKCTSLLPLEDVVSVVTHEDCITEVKMAYVNFVNHCYVDTEVEMKEIYTSNHIWTLFENFTLDMARVCSKREKRVADPTLEKYVLSVVLDTINAFFSSPFSENSTSLQTHQTIVVQLLQSTTRLLECPWLQQQHKGSVEACIRTLAMVAKGRAILLPMDLDAHISSMLSSGASCAAAAQRNASSYKATTRAFPRVTPTANQWDYKNIIEKLQDIITALEERLKPLVQAELSVLVDVLHWPELLFLEGSEAYQRCESGGFLSKLIQHTKDLMESEEKLCIKVLRTLQQMLLKKTKYGDRGNQLRKMLLQNYLQNRKSTSRGDLPDPIGTGLDPDWSAIAATQCRLDKEGATKLVCDLITSTKNEKIFQESIGLAIHLLDGGNTEIQKSFHNLMMSDKKSERFFKVLHDRMKRAQQETKSTVAVNMNDLGSQPHEDREPVDPTTKGRVASFSIPGSSSRYSLGPSLRRGHEVSERVQSSEMGTSVLIMQPILRFLQLLCENHNRDLQNFLRCQNNKTNYNLVCETLQFLDIMCGSTTGGLGLLGLYINEDNVGLVIQTLETLTEYCQGPCHENQTCIVTHESNGIDIITALILNDISPLCKYRMDLVLQLKDNASKLLLALMESRHDSENAERILISLRPQELVDVIKKAYLQEEERENSEVSPREVGHNIYILALQLSRHNKQLQHLLKPVKRIQEEEAEGISSMLSLNNKQLSQMLKSSAPAQEEEEDPLAYYENHTSQIEIVRQDRSMEQIVFPVPGICQFLTEETKHRLFTTTEQDEQGSKVSDFFDQSSFLHNEMEWQRKLRSMPLIYWFSRRMTLWGSISFNLAVFINIIIAFFYPYMEGASTGVLDSPLISLLFWILICFSIAALFTKRYSIRPLIVALILRSIYYLGIGPTLNILGALNLTNKIVFVVSFVGNRGTFIRGYKAMVMDMEFLYHVGYILTSVLGLFAHELFYSILLFDLIYREETLFNVIKSVTRNGRSILLTALLALILVYLFSIVGFLFLKDDFILEVDRLPNNHSTASPLGMPHGAAAFVDTCSGDKMDCVSGLSVPEVLEEDRELDSTERACDTLLMCIVTVMNHGLRNGGGVGDILRKPSKDESLFPARVVYDLLFFFIVIIIVLNLIFGVIIDTFADLRSEKQKKEEILKTTCFICGLERDKFDNKTVSFEEHIKLEHNMWNYLYFIVLVRVKNKTDYTGPESYVAQMIKNKNLDWFPRMRAMSLVSNEGEGEQNEIRILQDKLNSTMKLVSHLTAQLNELKEQMTEQRKRRQRLGFVDVQNCISR

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(9FTI_1)}(2) \setminus P_{f(4GQS_1)}(2)|=80\), \(|P_{f(4GQS_1)}(2) \setminus P_{f(9FTI_1)}(2)|=75\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:001010010101001011100110100101101010001101011111110010111011011001110000110010111001111100011010010101011001000111100100001001010000101011111011011010101010111011010101010100101010000001000000101100101110111011110101000110111111100110010100000001001110101001000010110011111011010001001110011100011011111001111000000101110101010010111010001001001001010111101011110101000010010000111101010100100101010110101110110000101010111100111100100100111001100011100000010011100100000100011100010001101111111110101111110001001011010111111101010101101101011001111010101111110100001111100111101001111101110010111101011000111
Pair \(Z_2\) Length of longest common subsequence
9FTI_1,4GQS_1 155 4
9FTI_1,8TKH_1 138 4
4GQS_1,8TKH_1 137 5

Newick tree

 
[
	9FTI_1:74.92,
	[
		8TKH_1:68.5,4GQS_1:68.5
	]:6.42
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1086 }{\log_{20} 1086}-\frac{477}{\log_{20}477})=158.\)
Status Protein1 Protein2 d d1/2
Query variables 9FTI_1 4GQS_1 201 179.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]