CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
2FKE_1 6TAX_1 4AWH_1 Letter Amino acid
5 261 12 A Alanine
5 248 3 Q Glutamine
3 139 6 H Histidine
5 352 12 S Serine
3 130 7 Y Tyrosine
9 302 9 V Valine
1 161 7 N Asparagine
1 119 4 C Cysteine
7 217 12 T Threonine
6 250 15 R Arginine
7 317 25 E Glutamic acid
7 578 10 L Leucine
1 55 2 W Tryptophan
7 195 5 P Proline
6 244 11 D Aspartic acid
13 203 10 G Glycine
5 228 18 I Isoleucine
8 267 14 K Lycine
3 101 9 M Methionine
5 204 13 F Phenylalanine

2FKE_1|Chain A|FK506 BINDING PROTEIN|Homo sapiens (9606)
>6TAX_1|Chain A|RNF213,E3 ubiquitin-protein ligase RNF213,E3 ubiquitin-protein ligase RNF213|Mus musculus (10090)
>4AWH_1|Chains A, B, C, D|POLYMERASE PA|INFLUENZA VIRUS (641501)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
2FKE , Knot 57 107 0.83 40 89 103
GVQVETISPGDGRTFPKRGQTCVVHYTGMLEDGKKFDSSRDRNKPFKFMLGKQEVIRGWEEGVAQMSVGQRAKLTISPDYAYGATGHPGIIPPHATLVFDVELLKLE
6TAX , Knot 1412 4638 0.85 42 394 3010
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXKLTLGLSILFMVEAAEFTVPKKDLDSLCYLLIPSAGSPEALHSDLSPVLRIRQRWRIYLTNLCLRCIDERCDRWLGILPLLHTCMQKSPPKKNSKSQPEDTWAGLEGISFSEFRDKAPTRSQPLQFMQSKMALLRVDEYLFRSWLSVVPLESLSSYLENSIDYLSDVPVRVLDCLQGISYRLPGLRKISNQNMKKDVENVFKMLMHLVDIYQHRIFGENLLQIYLTECLTLHETVCNITANHQFFEIPALSAELICKLLELSPPGHTDEGLPEKSYEDLVTSTLQEALATTRNWLRSLFKSRMLSISSAYVRLTYSEEMAVWRRLVEIGFPEKHGWKGSLLGDMEGRLKQEPPRLQISFFCSSQCRDGGLHDSVSRSFEKCVIEAVSSACQSQTSVLEGLSCQDLQKFGTLLSAVITKSWPVHNGEPVFDVDEIFKYLLKWPDVRQLFELCGTNEKIIDNITEEGRQLMATAESVFQKVAGELENGTIVVGQLELILEHQSQFLDIWNLNRRRLPSQEKACDVRSLLKRRRDDLLFLKQEKRYVESLLRQLGRVKHLVQVDFGNIEIIHSQDLSNKKLNEAVIKLPNSSSYKRETHYCLSPDIREMASKLDSLKDSHIFQDFWQETAESLNTLDKDPRELKVSLPEVLEYLYNPCYDNFYTLYENLKSGKITFAEVDAIFKDFVDKYDELKNDLKFMCTMNPQDQKGWISERVGQIKEYHTLHQAVSSAKVILQVRRALGVTGDFSVLNPLLNFADSFEDFGNEKLDQISPQFIKAKQLLQDISEPRQRCLEELARQTELVAWLHKALEDINELKVFVDLASISAGENDIDVDRVACFHDAVQGYASLLYKMDERTNFSDFMNHLQELWRALDNDQHLPDKLKDSARNLEWLKTVKESHGSVELSSLSLATAINSRGVYVIEAPKDGQKISPDTVLRLLLPDGHGYPEALRTYSTEELKELLNKLMLMSGKKDHNSNTEVEKFSEVFSNMQRLVHVFIKLHCAGNMLFRTWTAKVYCCPDGGIFMNFGLELLSQLTEKGDVIQLLGALCRQMEDFLDNWKTVVAQKRAEHFYLNFYTAEQLVYLSSELRKPRPSEAALMMLSFIKGKCTVQDLVQATSACESKADRYCLREVMKKLPQQLLSEPSLMGKLQVIMMQSLVYMSAFLPHCLDLDALGRCLAHLATMGGTPVERPLPKGLQAGQPNLILCGHSEVLPAALAIYMQAPRQPLPTFDEVLLCTPATTIEEVELLLRRCLTSGSQGHKVYSLLFADQLSYEVGCQAEEFFQSLCTRAHREDYQLVILCDAAREHCYIPSTFSQYKVPLVPQAPLPNIQAYLQSHYQVPKRLLSAATVFRDGLCVGIVTSERAGVGKSLYVNTLHTKLKAKLRDETVPLKIIRLTEPHLDENQVLSALLPFLKEKYQKMPVIFHIDISTSVQTGIPIFLFKLLILQYLMDINGKIWRRSPGHLYLVEIPQGLSVQPKRSSKLNARAPLFKFLDLFPKVTCRPPKEVIDMELTPERSHTDPAMDPVEFCSEAFQRPYQYLKRFHQQQNLDTFQYEKGSVEGSPEECLQHFLIYCGLINPSWSELRNFAWFLNCQLKDCEASIFCKSAFTGDTLRGFKNFVVTFMILMARDFATPTLHTSDQSPGRQSVTIGEVVEEDLAPFSLRKRWESEPHPYVFFNGDHMTMTFIGFHLETNNNGYVDAINPSNGKVIKKDVMTKELFDGLRLQRVPFNIDFDNLPRYEKLERLCLALGIEWPIDPDETYELTTDNMLKILAIEMRFRCGIPVIIMGETGCGKTRLIKFLSDLKRGSVEAETMKLVKVHGGTTPSMIYSKVKEAERTAFSNKAQHKLDTILFFDEANTTEAVSCIKEILCDRTVDGEHLHEDSGLHIIAACNPYRKHSQEMILRLESAGLGYRVSAEETADRLGSIPLRQLVYRVHALPPSLIPLVWDFGQLNDSAEKLYIQQIVQRLVDSVSVNPSETCVIADVLSASQMFMRKRENECGFVSLRDVERCVKVFRWFHDHSDMLLKELDKFLHESSDSTHTFERDPVLWSLVMAIGVCYHASLEEKASYRTAIARCFPKPYNSSRAILDEVTHVQDLFLRGAPIRTNIARNLALKENVFMMVICIELKIPLFLVGKPGSSKSLAKIIVADAMQGQAAFSELFRCLKQVHLVSFQCSPHSTPQGIISTFKQCARFQQGKDLGQYVSVVVLDEVGLAEDSPKMPLKTLHPLLEDGCIEDDPAPYKKVGFVGISNWALDPAKMNRGIFVSRGSPNEKELIESAEGICSSDRLVQDKIRGYFAPFAKAYETVCQKQDKEFFGLRDYYSLIKMVFAKAKASKRGLSPQDITHAVLRNFSGKDNIQALSIFTASLPEARYKEEVSTVELIKQNIYPGPQASSRGLDGAESRYLLVLTRNYVALQILQQTFFEGQQPEIIFGSSFPQDQEYTQICRNINRVKICMETGKMVVLLNLQNLYESLYDALNQYYVYLGGQKYVDLGLGTHRVKCRVHTAFRLIVIEEKDVVYKQFPVPLINRLEKHYLDMNTVLQPWQKSIVQELQQWAHEFADVKADQFIARHKYSPADVFIGYHSDACASVVLQAVERQGCRDLTEELYRKVSEEARSILLDCATPDAVVRLSGSSLGSFTAKQLSQEYYYAQQHNSFVDFLQAHLRMTHHECRAVFTEITTFSRLLTGNDCDVLASELRGLASKPVVLSLQQYDTEYSFLKDVRSWLTNPGKRKVLVIQADFDDGTRSAQLVASAKYTAINEINKTQGTKDFVFVYFVTKLSRMGSGTSYVGFHGGLWRSVHIDDLRRSTIMASDVTKLQNVTISQLFKPEDKPEQEEMEIETSQSKELAEEQMEVEDSEEMKKASDPRSCDCSQFLDTTRLVQSCVQGAVGMLRDQNESCARNMRRVTILLDLLNEDNTRNASFLRESKMRLHVLLNKQEENQVRSLKEWVTREAANQDALQEAGTFRHTLWKRVQDVVTPILASMIAHIDRDGNLELLAQPDSPAWVQDLWMFIYSDIKFLNISLVLNNTRSNSEMSFILVQSHMNLLKDAYNAVPFSWRIRDYLEELWVQAQYITDTEGLSKKFVEIFQKTPLGVFLAQFPVAQQQKLLQSYLKDFLLLTMKVSSREELMFLQMALWSCLRELQEASGTPDETYKFPLSLPWVHLAFQHFRTRLQNFSRILTIHPQVLSSLSQAAEKHSLAGCEMTLDAFAAMACAEMLKGDLLKPSPKAWLQLVKNLSTPLELVCSEGYLCDSGSMTRSVIQEVRALWNRIFSIALFVEHVLLGTESHIPELSPLVTTYVSLLDKCLEEDSNLKTCRPFVAVMTTLCDCKDKASKKFSRFGIQPCFICHGDAQDPVCLPCDHVYCLRCIQTWLIPGQMMCPYCLTDLPDKFSPTVSQDHRKAIEKHAQFRHMCNSFFVDLVSTMCFKDNTPPEKSVIDTLLSLLFVQKELLRDASQKHREHTKSLSPFDDVVDQTPVIRSVLLKLLLKYSFHEVKDYIQNYLTQLEKKAFLTEDKTELYLLFISCLEDSVHQKTSAGCRNLEQVLREEGHFLRTYSPGLQGQEPVRIASVEYLQEVARVRLCLDLAADFLSELQEGSELAEDKRRFLKHVEEFCTRVNNDWHRVYLVRKLSSQRGMEFVQSFSKQGHPCQWVFPRKVIAQQKDHVSLMDRYLVHGNEYKAVRDATAKAVLECKTLDIGNALMACRSPKPQQTAYLLLALYTEVAALYRSPNGSLHPEAKQLEAVNKFIKESKILSDPNIRCFARSLVDNTLPLLKIRSANSILKGTVTEMAVHVATILLCGHNQILKPLRNLAFYPVNMANAFLPTMPEDLLVHARTWRGLENVTWYTCPRGHPCSVGECGRPMQESTCLDCGLPVGGLNHTPHEGFSAIRNNEDRTQTGHVLGSPQSSGVAEVSDRGQSPVVFILTRLLTHLAMLVGATHNPQALTVIIKPWVQDPQGFLQQHIQRDLEQLTKMLGRSADETIHVVHLILSSLLRVQSHGVLNFNAELSTKGCRNNWEKHFETLLLRELKHLDKNLPAINALISQDERISSNPVTKIIYGDPATFLPHLPQKSIIHCSKIWSCRRKITVEYLQHIVEQKNGKETVPVLWHFLQKEAELRLVKFLPEILALQRDLVKQFQNVSRVEYSSIRGFIHSHSSDGLRKLLHDRITIFLSTWNALRRSLETNGEIKLPKDYCCSDLDLDAEFEVILPRRQGLGLCGTALVSYLISLHNNMVYTVQKFSNEDNSYSVDISEVADLHVISYEVERDLNPLILSNCQYQVQQGGETSQEFDLEKIQRQISSRFLQGKPRLTLKGIPTLVYRRDWNYEHLFMDIKNKMAQSSLPNLAISTISGQLQSYSDACEALSIIEITLGFLSTAGGDPGMDLNVYIEEVLRMCDQTAQVLKAFSRCQLRHIIALWQFLSAHKSEQRLRLNKELFREIDVQYKEELSTQHQRLLGTFLNEAGLDAFLLELHEMIVLKLKGPRAANSFNPNWSLKDTLVSYMETKDSDILSEVESQFPEEILMSSCISVWKIAATRKWDRQSRGGGHHHHHHHHHH
4AWH , Knot 97 204 0.84 40 145 197
MGSGMAMEDFVRQCFNPMIVELAEKAMKEYGEDPKIETNKFAAICTHLEVCFMYSDFHFIDERGESIIVESGDPNALLKHRFEIIEGRDRIMAWTVVNSICNTTGVEKPKFLPDLYDYKENRFIEIGVTRREVHIYYLEKANKIKSEKTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMASRSLWDSFRQSERGE

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(2FKE_1)}(2) \setminus P_{f(6TAX_1)}(2)|=0\), \(|P_{f(6TAX_1)}(2) \setminus P_{f(2FKE_1)}(2)|=305\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11010010110100110010001100011100100100000000110111100011011001110101100101010100101101011111101011101011010
Pair \(Z_2\) Length of longest common subsequence
2FKE_1,6TAX_1 305 4
2FKE_1,4AWH_1 158 3
6TAX_1,4AWH_1 251 4

Newick tree

 
[
	6TAX_1:15.67,
	[
		2FKE_1:79,4AWH_1:79
	]:75.67
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{4745 }{\log_{20} 4745}-\frac{107}{\log_{20}107})=1095\)
Status Protein1 Protein2 d d1/2
Query variables 2FKE_1 6TAX_1 1388 709.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]