CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7OQY_1 8GVA_1 3OXA_1 Letter Amino acid
76 149 7 L Leucine
20 29 6 M Methionine
40 60 4 T Threonine
8 10 2 W Tryptophan
67 87 9 R Arginine
48 50 7 D Aspartic acid
23 47 10 Q Glutamine
70 95 13 G Glycine
36 17 4 Y Tyrosine
28 62 4 F Phenylalanine
57 86 13 V Valine
10 10 1 C Cysteine
66 106 9 E Glutamic acid
83 52 7 I Isoleucine
53 45 1 K Lycine
34 20 5 N Asparagine
19 36 3 H Histidine
52 91 8 P Proline
42 106 12 A Alanine
48 83 6 S Serine

7OQY_1|Chain A|DNA-directed RNA polymerase subunit A'|Sulfolobus acidocaldarius DSM 639 (330779)
>8GVA_1|Chains A, B|Anion exchange protein 2|Homo sapiens (9606)
>3OXA_1|Chains A, B, C, D|Steroid Delta-isomerase|Pseudomonas putida (303)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7OQY , Knot 329 880 0.84 40 302 808
MSEKIIRGVKFGVLSPNEIRQMSVTAIITSEVYDEDGTPIEGGVMDPKLGVIEPGQKCPVCGNTLAGCPGHFGHIELIKPVIHIGYVKHIYDFLRSTCWRCGRIKIKEQDLERYKRIYNAIKLRWPSAARRLVEYIKKISIKNLECPHCGEKQFKIKLEKPYNFNEERNGSIVKLSPSEIRDRLERIPDSDVELLGYDPKSSRPEWMILTVLPVPPITIRPSITIESGIRAEDDLTHKLVDIIRLNERLKESIEAGAPQLIIEDLWDLLQYHVATYFDNEIPGLPPAKHRSGRPLRTLAQRLKGKEGRFRGNLSGKRVDFSARTVISPDPNLSIDEVGIPYTIARMLTVPERVTNINIERIRQYIINGPDKWPGANYVIKPDGRRIDLRYVKDRKELASSITAGYVVERHLVDGDVVLFNRQPSLHRISMMAHKVRVLPGRTFRLNLLDCPPYNADFDGDEMNLHVPQSEEAIAEARELMLVHKNIITPRYGGPIIGGGQDYISGAYLLSVKTTLLTVEEVATILGVTDFVGELGEPAILAPKPYYTGKQVISLFLPKDFNFHGPANISKGPRACKDEICPHDSFIVIKNGLLLEGVFDKKAIGNQQPESMLHWSIREYGTEYGKWLMDNVFKMFIRFLEMRGFTMTLEDITIPDEAQNEITTKIKEGYSQVDEYIRKFNEGQLEPIPGRTIEESLESYILDTLDKLRKVAGEIATKYLDPFNNVYIMAITGARGSELNITQMTALLGQQSVRGERIRRGYRERTLSLFKYGDIAPEARGFVKNSFMRGLSPYEMFFHAAGGREGLVDTAVKTSQSGYMQRRLINALSDLRIEYDGTVRSLYGDIVQVVYGDDAVHPMYSAHSKSVNVNRVIERVIGWKR
8GVA , Knot 431 1241 0.82 40 310 1034
MSSAPRRPAKGADSFCTPEPESLGPGTPGFPEQEEDELHRTLGVERFEEILQEAGSRGGEEPGRSYGEEDFEYHRQSSHHIHHPLSTHLPPDARRRKTPQGPGRKPRRRPGASPTGETPTIEEGEEDEDEASEAEGARALTQPSPVSTPSSVQFFLQEDDSADRKAERTSPSSPAPLPHQEATPRASKGAQAGTQVEEAEAEAVAVASGTAGGDDGGASGRPLPKAQPGHRSYNLQERRRIGSMTGAEQALLPRVPTDEIEAQTLATADLDLMKSHRFEDVPGVRRHLVRKNAKGSTQSGREGREPGPTPRARPRAPHKPHEVFVELNELLLDKNQEPQWRETARWIKFEEDVEEETERWGKPHVASLSFRSLLELRRTLAHGAVLLDLDQQTLPGVAHQVVEQMVISDQIKAEDRANVLRALLLKHSHPSDEKDFSFPRNISAGSLGSLLGHHHGQGAESDPHVTEPLMGGVPETRLEVERERELPPPAPPAGITRSKSKHELKLLEKIPENAEATVVLVGCVEFLSRPTMAFVRLREAVELDAVLEVPVPVRFLFLLLGPSSANMDYHEIGRSISTLMSDKQFHEAAYLADEREDLLTAINAFLDCSVVLPPSEVQGEELLRSVAHFQRQMLKKREEQGRLLPTGAGLEPKSAQDKALLQMVEAAGAAEDDPLRRTGRPFGGLIRDVRRRYPHYLSDFRDALDPQCLAAVIFIYFAALSPAITFGGLLGEKTQDLIGVSELIMSTALQGVVFCLLGAQPLLVIGFSGPLLVFEEAFFSFCSSNHLEYLVGRVWIGFWLVFLALLMVALEGSFLVRFVSRFTQEIFAFLISLIFIYETFYKLVKIFQEHPLHGCSASNSSEVDGGENMTWAGARPTLGPGNRSLAGQSGQGKPRGQPNTALLSLVLMAGTFFIAFFLRKFKNSRFFPGRIRRVIGDFGVPIAILIMVLVDYSIEDTYTQKLSVPSGFSVTAPEKRGWVINPLGEKSPFPVWMMVASLLPAILVFILIFMETQITTLIISKKERMLQKGSGFHLDLLLIVAMGGICALFGLPWLAAATVRSVTHANALTVMSKAVAPGDKPKIQEVKEQRVTGLLVALLVGLSIVIGDLLRQIPLAVLFGIFLYMGVTSLNGIQFYERLHLLLMPPKHHPDVTYVKKVRTLRMHLFTALQLLCLALLWAVMSTAASLAFPFILILTVPLRMVVLTRIFTDREMKCLDANEAEPVFDEREGVDEYNEMPMPV
3OXA , Knot 65 131 0.80 40 109 124
MNLPTAQEVQGLMARYIELVDVGDIEAIVQMYADDATVENPFGQPPIHGREQIAAFYRQGLGGGKVRASLTGPVRASHNGSGAMPFRVEMVWNGQPSALDVIDVMRFDEHGRIQTCQAYWSEVNLSVREPQ

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7OQY_1)}(2) \setminus P_{f(8GVA_1)}(2)|=47\), \(|P_{f(8GVA_1)}(2) \setminus P_{f(7OQY_1)}(2)|=55\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1000110110111101001001010111000100001011011110101111011000110100111011011010110111011010010011000010010101000010000010011010110110011001001010010010010001010100100100000101101010010001001100010111001000010111101111111010101010011010001000110110100010001011110111001101100011001000111111100001011001100101001010101010010101001101010101001111001101101100100101001000110110011110011010100101001000001100101101100011010111100010100101110010111100101011001100101010010101100001110100111100011010011111111000101101101000110100110111100111011011111101000100110111100101011101001101000010100011110011110111000111000100110101000100010111001101110110101101010010110010001000100100010001001001010111100100010001100100100111011000101100101111011010010100101111000101001001000001011001011101011100011011010011101111001110011000001010001101100101000101001010110110100110110010000101001100111100
Pair \(Z_2\) Length of longest common subsequence
7OQY_1,8GVA_1 102 5
7OQY_1,3OXA_1 237 3
8GVA_1,3OXA_1 231 4

Newick tree

 
[
	3OXA_1:13.86,
	[
		7OQY_1:51,8GVA_1:51
	]:80.86
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{2121 }{\log_{20} 2121}-\frac{880}{\log_{20}880})=299.\)
Status Protein1 Protein2 d d1/2
Query variables 7OQY_1 8GVA_1 376 323.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]