CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
4AYC_1 5DCA_1 6UJS_1 Letter Amino acid
3 66 36 Y Tyrosine
6 104 119 A Alanine
8 77 58 R Arginine
10 116 47 N Asparagine
5 35 34 M Methionine
1 80 31 P Proline
8 23 8 C Cysteine
1 82 102 G Glycine
12 146 104 I Isoleucine
17 133 85 K Lycine
5 70 63 Q Glutamine
2 39 24 H Histidine
1 26 11 W Tryptophan
10 157 89 S Serine
3 97 68 T Threonine
7 125 93 V Valine
4 105 55 D Aspartic acid
19 154 73 E Glutamic acid
12 223 116 L Leucine
4 90 66 F Phenylalanine

4AYC_1|Chain A|E3 UBIQUITIN-PROTEIN LIGASE RNF8|HOMO SAPIENS (9606)
>5DCA_1|Chain A|Pre-mRNA-splicing helicase BRR2|Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (559292)
>6UJS_1|Chain A|ATP-dependent translocase ABCB1|Mus musculus (10090)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
4AYC , Knot 66 138 0.78 40 100 134
LGSMEELNRSKKDFEAIIQAKNKELEQTKEEKEKMQAQKEEVLSHMNDVLENELQCIICSEYFIEAVTLNCAHSFCSYCINEWMKRKIECPICRKDIKSKTYSLVLDNCINKMVNNLSSEVKERRIVLIRERKAKRLF
5DCA , Knot 653 1948 0.84 40 365 1575
DPSNVETYEQILQWVTEVLGNDIPHDLIIGTADIFIRQLKENEENEDGNIEERKEKIQHELGINIDSLKFNELVKLMKNINKRALPNIENDIIKLSDSKTSNIESVPIYSIDEFFLQRKLRSELGYKDTSVIQDLSEKILNDIETLEHNPVALEQKLVDLLKFENISLAEFILKNRSTIFWGIRLAKSTENEIPNLIEKMVAKGLNDLVEQYKFRENPAIPPVIDLEKIKFDESSKLMTVTKVSLPEGSFKRVKPQYDEIHIPAPSKPVIDYELKEITSLPDWCQEAFPSSETTSLNPIQSKVFHAAFEGDSNMLICAPTGSGKTNIALLTVLKALSHHYNPKTKKLNLSAFKIVYIAPLKALVQEQVREFQRRLAFLGIKVAELTGDSRLSRKQIDETQVLVSTPEKWDITTRNSNNLAIVELVRLLIIDEIHLLHDDRGPVLESIVARTFWASKYGQEYPRIIGLSATLPNYEDVGRFLRVPKEGLFYFDSSFRPCPLSQQFCGIKERNSLKKLKAMNDACYEKVLESINEGNQIIVFVHSRKETSRTATWLKNKFAEENITHKLTKNDAGSKQILKTEAANVLDPSLRKLIESGIGTHHAGLTRSDRSLSEDLFADGLLQVLVCTATLAWGVNLPAHTVIIKGTDVYSPEKGSWEQLSPQDVLQMLGRAGRPRYDTFGEGIIITDQSNVQYYLSVLNQQLPIESQFVSKLVDNLNAEVVAGNIKCRNDAVNWLAYTYLYVRMLASPMLYKVPDISSDGQLKKFRESLVHSALCILKEQELVLYDAENDVIEATDLGNIASSFYINHASMDVYNRELDEHTTQIDLFRIFSMSEEFKYVSVRYEEKRELKQLLEKAPIPIREDIDDPLAKVNVLLQSYFSQLKFEGFALNSDIVFIHQNAGRLLRAMFEICLKRGWGHPTRMLLNLCKSATTKMWPTNCPLRQFKTCPVEVIKRLEASTVPWGDYLQLETPAEVGRAIRSEKYGKQVYDLLKRFPKMSVTCNAQPITRSVMRFNIEIIADWIWDMNVHGSLEPFLLMLEDTDGDSILYYDVLFITPDIVGHEFTLSFTYELKQHNQNNLPPNFFLTLISENWWHSEFEIPVSFNGFKLPKKFPPPTPLLENISISTSELGNDDFSEVFEFKTFNKIQSQVFESLYNSNDSVFVGSGKGTGKTAMAELALLNHWRQNKGRAVYINPSGEKIDFLLSDWNKRFSHLAGGKIINKLGNDPSLNLKLLAKSHVLLATPVQFELLSRRWRQRKNIQSLELMIYDDAHEISQGVYGAVYETLISRMIFIATQLEKKIRFVCLSNCLANARDFGEWAGMTKSNIYNFSPSERIEPLEINIQSFKDVEHISFNFSMLQMAFEASAAAAGNRNSSSVFLPSRKDCMEVASAFMKFSKAIEWDMLNVEEEQIVPYIEKLTDGHLRAPLKHGVGILYKGMASNDERIVKRLYEYGAVSVLLISKDCSAFACKTDEVIILGTNLYDGAEHKYMPYTINELLEMVGLASGNDSMAGKVLILTSHNMKAYYKKFLIEPLPTESYLQYIIHDTLNNEIANSIIQSKQDCVDWFTYSYFYRRIHVNPSYYGVRDTSPHGISVFLSNLVETCLNDLVESSFIEIDDTEAETEIISTLSNGLIASHYGVSFFTIQSFVSSLSNTSTLKNMLYVLSTAVEFESVPLRKGDRALLVKLSKRLPLRFPEHTSSGSVSFKVFLLLQAYFSRLELPVDFQNDLKDILEKVVPLINVVVDILSANGYLNATTAMDLAQMLIQGVWDVDNPLRQIPHFNNKILEKCKEINVETVYDIMALEDEERDEILTLTDSQLAQVAAFVNNYPNVELTYSLNNSDSLISGVKQKITIQLTRDVEPENLQVTSEKYPFDKLESWWLVLGEVSKKELYAIKKVTLNKETQQYELEFDTPTSGKHNLTIWCVCDSYLDADKELSFEINVK
6UJS , Knot 435 1282 0.81 40 317 1039
MELEEDLKGRADKNFSKMGKKSKKEKKEKKPAVSVLTMFRYAGWLDRLYMLVGTLAAIIHGVALPLMMLIFGDMTDSFASVGQVSKQSTQMSEADKRAMFAKLEEEMTTYAYYYTGIGAGVLIVAYIQVSFWCLAAGRQIHKIRQKFFHAIMNQEIGWFDVHDVGELNTRLTDDVSKINEGIGDKIGMFFQAMATFFGGFIIGFTRGWKLTLVILAISPVLGLSAGIWAKILSSFTDKELHAYAKAGAVAEEVLAAIRTVIAFGGQKKELERYNNNLEEAKRLGIKKAITANISMGAAFLLIYASYALAFWYGTSLVISKEYSIGQVLTVFFSVLIGAFSVGQASPNIEAFANARGAAYEVFKIIDNKPSIDSFSKSGHKPDNIQGNLEFKNIHFSYPSRKEVQILKGLNLKVKSGQTVALVGNSGCGKSTTVQLMQRLYDPLDGMVSIDGQDIRTINVRYLREIIGVVSQEPVLFATTIAENIRYGREDVTMDEIEKAVKEANAYDFIMKLPHQFDTLVGERGAQLSGGQKQRIAIARALVRNPKILLLDEATSALDTESEAVVQAALDKAREGRTTIVIAHRLSTVRNADVIAGFDGGVIVEQGNHDELMREKGIYFKLVMTQTAGNEIELGNEACKSKDEIDNLDMSSKDSGSSLIRRRSTRKSICGPHDQDRKLSTKEALDEDVPPASFWRILKLNSTEWPYFVVGIFCAIINGGLQPAFSVIASKVVGVFTNGGPPETQRQNSNLFSLLFLILGIISFITFFLQGFTFGKAGEILTKRLRYMVFKSMLRQDVSWFDDPKNTTGALTTRLANDAAQVKGATGSRLAVIFQNIANLGTGIIISLIYGWQLTLLLLAIVPIIAIAGVVEMKMLSGQALKDKKELEGSGKIATEAIENFRTVVSLTREQKFETMYAQSLQIPYRNAMKKAHVFGITFSFTQAMMYFSYAAAFRFGAYLVTQQLMTFENVLLVFSAIVFGAMAVGQVSSFAPDYAKATVSASHIIRIIEKTPEIDSYSTQGLKPNMLEGNVQFSGVVFNYPTRPSIPVLQGLSLEVKKGQTLALVGSSGCGKSTVVQLLERFYDPMAGSVFLDGKEIKQLNVQWLRAQLGIVSQEPILFDCSIAENIAYGDNSRVVSYEEIVRAAKEANIHQFIDSLPDKYNTRVGDKGTQLSGGQKQRIAIARALVRQPHILLLDEATSALDTESEKVVQEALDKAREGRTCIVIAHRLSTIQNADLIVVIQNGKVKEHGTHQQLLAQKGIYFSMVSVQAGAKRSHHHHHH

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(4AYC_1)}(2) \setminus P_{f(5DCA_1)}(2)|=7\), \(|P_{f(5DCA_1)}(2) \setminus P_{f(4AYC_1)}(2)|=272\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:110100100000010111010000100000000010100001100100110001001100001101101001001000010011000100110000100000011100010011001000100001111000010011
Pair \(Z_2\) Length of longest common subsequence
4AYC_1,5DCA_1 279 4
4AYC_1,6UJS_1 239 4
5DCA_1,6UJS_1 70 5

Newick tree

 
[
	4AYC_1:14.61,
	[
		6UJS_1:35,5DCA_1:35
	]:11.61
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{2086 }{\log_{20} 2086}-\frac{138}{\log_{20}138})=498.\)
Status Protein1 Protein2 d d1/2
Query variables 4AYC_1 5DCA_1 630 335
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]