CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7MBV_1 4FDA_1 6HBU_1 Letter Amino acid
37 11 34 N Asparagine
20 7 12 C Cysteine
50 13 21 Q Glutamine
68 23 45 I Isoleucine
53 13 24 P Proline
63 17 38 T Threonine
28 3 10 H Histidine
137 21 77 L Leucine
68 27 56 S Serine
86 19 47 A Alanine
67 8 21 D Aspartic acid
77 13 43 V Valine
64 16 22 R Arginine
70 16 29 E Glutamic acid
61 16 48 G Glycine
67 17 40 K Lycine
34 10 19 M Methionine
58 7 42 F Phenylalanine
26 5 5 W Tryptophan
31 11 22 Y Tyrosine

7MBV_1|Chains A, B, C, D|Transient receptor potential melastatin 5|Danio rerio (7955)
>4FDA_1|Chain A|3-oxoacyl-[acyl-carrier-protein] reductase|Saccharomyces cerevisiae (559292)
>6HBU_1|Chains A, B|ATP-binding cassette sub-family G member 2|Homo sapiens (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7MBV , Knot 425 1165 0.85 40 343 1038
MVEKSSERFDKQMAGRLGDIDFTGVSRTRGKFVRVTSSTDPAEIYQILTKQWGLAPPHLVVALMGGDEVAQLKPWLRDTLRKGLVKAAQSTGAWILTSGLRFGITKNLGQAVRDHSLASTSPKVRVVAIGIAPWNMIQNRDLLLSAKPDHPATYPTEDLPYGAVYSLDCNHSHFILVDEDPKRPGATGEMRVKMLKHISLQRTGYGGTGSIEIPVLCLLVHGEPRILQKMYKNIQNSIPWLILAGSGGVADILVTLMDRGCWDADIVQELLINTFPDGLHSTEITSWTKLIQRILDHGHLLTVHDPEQDSELDTVILKALVKACKSQSQEAQDFLDELKLAVAWNRVDIAKSEIFSGDVQWSAQDLEEVMMEALVNDKPDFVRLFVDNGVNIKQFLTYGRLQELYCSVSEKNLLHTLLLKKNQERQAQLARKRMSGNPNNELGDRKFRFTFHEVSKVLKDFLDDTCKGFYQKLPAERMGKGRLFHSQKNLPDMDRRCEHPWRDLFLWAILQNRQEMANYFWAMGPEAVAAALVGCKIMKEMAHLATEAESARSMKNAKYEQFAMDLFSECYSNSEDRAYSLLVRKTCCWSKATVLNIATLAEAKCFFAHDGVQALLTKVWWGAMRTDTSISRLVLTFFIPPLVWTSLIKFNPEEQVSKDEGEPFAELDSLETEQALLLTDGDPVAGEGSAETAARSCSATFIRVVLRRWNRFWSAPVTVFMGNVIMYFAFLILFSYVLLLDFRPPPPYGPSAAEIILYFWVFTLVLEEIRQSFFTDEDMSILKKMKLYVEDNWNKCDMVAISLFVVGLSCRMAMSTYEAGRTVLALDFMVFTLRLIHIFAIHKQLGPKIIIVERMIKDVFFFLFFLSVWLIAYGVTTQALLHPNDPRIDWVFRRALYRPYLHIFGQIPLEEIDAAKMPDDNCTTDVQEIILGTLPPCPNIYANWLVILLLVIYLLVTNVLLLNLLIAMFSYTFQVVQENADIFWKFQRYNLIVEYHSRPALAPPFIIISHITQALLSFIKKTENTQDLLERELPSGLDQKLMTWETVQKENYLAKLEHEHRESSGERLRYTSSKVQTLLRMVGGFKDQEKRMATVETEVRYCGEVLSWIAECFHKSTLKCDRDAPKAPRSIAGSSRDQQPQGAKRQQPGGHPAYGTDKKLPFIDH
4FDA , Knot 124 273 0.85 40 183 266
MHYLPVAIVTGATRGIGKAICQKLFQKGLSCIILGSTKESIERTAIDRGQLQSGLSYQRQCAIAIDFKKWPHWLDYESYDGIEYFKDRPPLKQKYSTLFDPCNKWSNNERRYYVNLLINCAGLTQESLSVRTTASQIQDIMNVNFMSPVTMTNICIKYMMKSQRRWPELSGQSARPTIVNISSILHSGKMKVPGTSVYSASKAALSRFTEVLAAEMEPRNIRCFTISPGLVKGTDMIQNLPVEAKEMLERTIGASGTSAPAEIAEEVWSLYSR
6HBU , Knot 257 655 0.84 40 265 606
MSSSNVEVFIPVSQGNTNGFPATASNDLKAFTEGAVLSFHNICYRVKLKSGFLPCRKPVEKEILSNINGIMKPGLNAILGPTGGGKSSLLDVLAARKDPSGLSGDVLINGAPRPANFKCNSGYVVQDDVVMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIRGVSGGERKRTSIGMELITDPSILFLDQPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTAVALNREEDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITVFKEISYTTSFCHQLRWVSKRSFKNLLGNPQASIAQIIVTVVLGLVIGAIYFGLKNDSTGIQNRAGVLFFLTTNQCFSSVSAVELFVVEKKLFIHEYISGYYRVSSYFLGKLLSDLLPMRMLPSIIFTCIVYFMLGLKPKADAFFVMMFTLMMVAYSASSMALAIAAGQSVVSVATLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNFCPGLNATGNNPCNYATCTGEEYLVKQGIDLSPWGLWKNHVALACMIVIFLTIAYLKLLFLKKYS

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7MBV_1)}(2) \setminus P_{f(4FDA_1)}(2)|=180\), \(|P_{f(4FDA_1)}(2) \setminus P_{f(7MBV_1)}(2)|=20\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100000010001110110101011000010110100000110100110001111110111111110011010111000100111011000111110011011100011011000011000101011111111101100001110101001100100011011100100000011110001001110101010110010100010110101011110111010101100100010001111111101111011101100101010110011100110110000100100110011001011010010000010011101110100000001001100101111100101100011010101010010011101110001011011100110100110010100100010000110011100000001011000101010001100010101001001100110000011000111001101011000001101000000110011111110000011001111110111111110011001101100100100100100001110110000000000100111000001001011011011010011100110111001111110000010011101111111100110101000100001011101001000011110010111101010011000010110111001001101110111101110111111100111101011110110110111011110111001000110000101100101010001000011110111111000111000011001111011110101101111000111011110011001111111101111101100011101001010111001100101011101110010110110000000100111101110101010111111111011100111101111110001011000101110100001110000011111111110010011101100000000110001101100011010010000011010000000010010000001001101111100000011010001000101101110010000100000110110011100000010110000111011010000111100
Pair \(Z_2\) Length of longest common subsequence
7MBV_1,4FDA_1 200 4
7MBV_1,6HBU_1 122 4
4FDA_1,6HBU_1 170 4

Newick tree

 
[
	4FDA_1:10.20,
	[
		7MBV_1:61,6HBU_1:61
	]:40.20
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{1438 }{\log_{20} 1438}-\frac{273}{\log_{20}273})=303.\)
Status Protein1 Protein2 d d1/2
Query variables 7MBV_1 4FDA_1 385 235.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]