CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
8YQV_1 5IOY_1 2XNC_1 Letter Amino acid
86 18 12 R Arginine
47 5 12 M Methionine
17 0 5 C Cysteine
77 6 8 Q Glutamine
86 10 27 G Glycine
71 4 33 K Lycine
85 19 15 T Threonine
11 4 7 W Tryptophan
85 33 19 A Alanine
67 13 18 D Aspartic acid
77 15 25 E Glutamic acid
30 3 4 H Histidine
67 10 16 P Proline
91 13 23 V Valine
80 8 11 N Asparagine
127 8 16 I Isoleucine
138 21 22 L Leucine
48 9 14 F Phenylalanine
102 13 17 S Serine
58 4 11 Y Tyrosine

8YQV_1|Chain A|DNA-directed RNA polymerase subunit|African swine fever virus (10497)
>5IOY_1|Chain A|TetR-family transcriptional regulatory repressor protein|Mycobacterium tuberculosis (strain ATCC 25177 / H37Ra) (419947)
>2XNC_1|Chains A, B|FERREDOXIN--NADP REDUCTASE, LEAF ISOZYME, CHLOROPLASTIC|PISUM SATIVUM (3888)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
8YQV , Knot 522 1450 0.87 40 348 1261
MEAGYAEIAAVQFNIAGDNDHKRQGVMEVTISNLFEGTLPAEGGIYDARMGTTDHHYKCITCSHQRKQCMGHPGILQMHAPVLQPLFIAEIRRWLRVICLNCGAPIVDLKRYEHLIRPKRLIEAASSQTEGKQCYVCKAVHPKIVKDSEDYFTFWADQQGKIDKLYPQIIREIFSRVTYDTVVKLGRSKNSHPEKLVLKAIQIPPISIRPGIRLGIGSGPQSFHDINNVIQYLVRKNLLIPKDLQIVRGQKIPLNIDRNLQTIQQLYYNFLLDSVSTTATQGGTGKRGIVMGARPAPSIMRRLPRKEGRIRKSLLGSQVWSISRSTICGNSDLHLDEVGYPISFARTLQVAETVQHYNINRLMPYFLNGKRQYPGCSRVYKQITQSVHDIEGLKQDFRLEVGDILYRDVVTGDVAFFNRQPSLERSSIGVHRIVVLENPKISTFQMNVSACAWYNADFDGDQMNLWVPWSVMSRVEAELLCSVRNWFISTKSSGPVNGQVQDSTVGSFLLTRTNTPMGKNVMNKLHAMGLFQTTQTDPPCFANYSPTDLLDGKSVVSMLLRQTPINYQRAPTWYSEVYAPYMHYNKQDISTQIRNGELIEGVLDKKAVGAGSSGGIYHLISRRYGPQQALKMIFATQQLALNYVRNAGFTVSTADMLLTPEAHQEVQEIINELLLESEEINNRLLHGDIMPPIGLTTHDFYEKLQLNALKFPDRILKPIMNSINPETNGLFQMVATGAKGSNPNMIHIMAGIGQIEINTQRIQPQFSFGRTLVYYPRFALEAQAYGFICNSYIAGLTSPEFIFGEMNGRFDLINKALSTSSTGYANRKAIFGLQSCIVDYYRRVSIDTRLVQQLYGEDGLDARQLETVRFETIMLSDQELEDKFKYTGIQSPLFEEEFSRLKKDRDKYRQIFLNVENFNFSQLLTDVRQVPVNVASIVKNILLSSTSGVLPFDEKSILQKYAMVKTFCKNLPYVFINNIQERLQTPIPVYLKRAASLMRMLIRIELATVKTLNITCEQMSAILDLIRLQYTQSLINYGEAVGILAAQSVSEPLTQYMLDSHHRSVAGGTNKSGIVRPQEIFSAKPVEAEQSSEMLLRLKNPEVETNKTYAQEIANSIELITFERLILQWHLLYETYSSTKKNVMYPDFASDVEWMTDFLENHPLLQPPEDIANWCIRLELNKTTMILKSISLESIINSLRAKHPNTYIMHSVENTASGIPIIIRIYLRESAFRRSTNTRMATDEKIAVNVVDKLLNSTIRGIPGIKNANVVKLMRHRVDAQGKLVRLDNIYAIKTNGTNIFGAMLDDNIDPYTIVSSSIGDTMELYGIEAARQKIISEIRTVMGDKGPNHRHLLMYADLMTRTGQVTSLEKAGLNAREPSNVLLRMALSSPVQVLTDAAVDSAVNPIYGIAAPTLMGSVPRIGTMYSDIIMDEKYITENYKSVDSLIDML
5IOY , Knot 97 216 0.80 38 136 207
MTTSAASQASLPRGRRTARPSGDDRELAILATAENLLEDRPLADISVDDLAKGAGISRPTFYFYFPSKEAVLLTLLDRVVNQADMALQTLAENPADTDRENMWRTGINVFFETFGSHKAVTRAGQAARATSVEVAELWSTFMQKWIAYTAAVIDAERDRGAAPRTLPAHELATALNLMNERTLFASFAGEQPSVPEARVLDTLVHIWVTSIYGENR
2XNC , Knot 137 315 0.83 40 200 301
GSMAAAGRRIPGYRAQVTTEAPAKVVKHSKKQDENIVVNKFKPKEPYVGRCLLNTKITGDDAPGETWHMVFSTEGEVPYREGQSIGIVPDGIDKNGKPHKLRLYSIASSAIGDFGDSKTVSLCVKRVPDGVCSNFLCDLKPGSEVKITGPVGKEMLMPKDPNATVIMLGTGTGIAPFRSFLWKMFFEKHEDYQFNGLAWLFLGVPTSSSLLYKEEFEKMKEKAPENFRLDFAVSREQVNDKGEKMYIQTRMAQYAEELWELLKKDNTFVYMCGLKGMEKGIDDIMVSLAAKDGIDWIEYKRTLKKAEQWNVEVYW

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(8YQV_1)}(2) \setminus P_{f(5IOY_1)}(2)|=221\), \(|P_{f(5IOY_1)}(2) \setminus P_{f(8YQV_1)}(2)|=9\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1011010111101011100000001110101001101011101110010110000000010000000001101111010111101111101001101101001111101000001101001101100000100001001101011000000101110001010010101100110010000110110000001001110110111101011101111011001001001100110001111001011010011101000100100100011100100010011010011111101110110011000101000111001101000010100010100110110110010110010000100111011010000110001000100010010110001010110110001101011110001010000111001111001010010101010110010101001011111011001010110010011100000111010100001101110000011100110010111110000001101100010011010011011100011000011010001011010000001000100101101110001111100111001100001100110111100011100100111010010111010100010011001110000100011010111111100001000101011011001101110010100011101110110100101101111110101000010101011001100101110101011100001111001011110101010110011000001010001111100011000001010001100101001101001001010011100001000100011001110001001000000000111010010100110010011101101100111000011111000011000111001000110111001000100111101001101101110101101001010000101110110100000110010111111100100110001100000011110000111010011010110100000111010010100000010011001011010011101011000000000011010110010110011000111011001101010101000011100101001100101001000110010001011111101010001100000001100001110110011000101111100101101100010101011010010110001001111110001010011000110010101101100011001001110011000011101011000101001001110100100111011100110110011100110110111110111011011010001110000100000010011011
Pair \(Z_2\) Length of longest common subsequence
8YQV_1,5IOY_1 230 4
8YQV_1,2XNC_1 186 4
5IOY_1,2XNC_1 184 3

Newick tree

 
[
	8YQV_1:10.44,
	[
		2XNC_1:92,5IOY_1:92
	]:16.44
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1666 }{\log_{20} 1666}-\frac{216}{\log_{20}216})=375.\)
Status Protein1 Protein2 d d1/2
Query variables 8YQV_1 5IOY_1 487 276
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]