CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
7XKX_1 7OKL_1 4AKI_1 Letter Amino acid
33 8 160 D Aspartic acid
16 4 96 Q Glutamine
19 9 194 I Isoleucine
16 7 149 F Phenylalanine
25 7 150 T Threonine
18 3 93 Y Tyrosine
30 10 166 V Valine
32 9 110 R Arginine
43 6 199 E Glutamic acid
46 5 130 G Glycine
14 4 203 K Lycine
9 6 64 M Methionine
66 8 112 A Alanine
7 7 134 N Asparagine
7 5 41 C Cysteine
46 15 318 L Leucine
13 0 40 W Tryptophan
15 3 55 H Histidine
40 3 79 P Proline
37 9 202 S Serine

7XKX_1|Chains A, B|SQHop_cyclase_C domain-containing protein|Kitasatospora sp. CB02891 (2020329)
>7OKL_1|Chain A|B-cell lymphoma 6 protein|Homo sapiens (9606)
>4AKI_1|Chains A, B|GLUTATHIONE S-TRANSFERASE CLASS-MU 26 KDA ISOZYME, DYNEIN HEAVY CHAIN CYTOPLASMIC|SCHISTOSOMA JAPONICUM (6182)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
7XKX , Knot 207 532 0.81 40 235 492
MGSSHHHHHHSQDPGDENLYFQSMSDADRIAALLKDRAADPVTKFSPSPYETGQFLRISERADVGTPQIDYLLATQRPDGLWGSVGFELVPTLGAVAGLSSRPEYADRAGVTDAVARACEKLWELALGEGGLPRLPDTVASEIIVPSLIDLLGEVLQRHRPAAGGKAGHEQEFPSPPGAKPELWRRLSDRIARGQAIPETAWHTLEAFHPLPEQFAATVTPAADGAVTCSPSSTAAWVSAVGTDAGASTRAYLDEAQSRYGGAIPMGSSMPYFEVLWVLNLVLKYFPDVPIPREIIEEIAAGFSESGIGGGPGLPPDGDDTAYANLAGDKLGAPTHPEILMKFWAEDHFVSYPGEQTPSETVNAHALEYLNHLRLRRGIAEYGAVEDACAEWVISQQTEDGCWYDKWNVSPYYSTAACVEALLDARKQDEPQLDSLRRAREWLLRHQTDSGGWGMAEPSPEETAYAVMALDLFASRGGKGAEECAAAISRAKEFFKDESRENPPLWMGKDLYTPFRIVEVTVMCGRAVVSRY
7OKL , Knot 65 128 0.82 38 98 125
GPGADSCIQFTRHASDVLLNLNRLRSRDILTDVVIVVSREQFRAHKTVLMACSGLFYSIFTDQLKCNLSVINLDPEINPEGFCILLDFMYTSRLNLREGNIMAVMATAMYLQMEHVVDTCRKFIKASE
4AKI , Knot 886 2695 0.86 40 380 2037
SPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELGLEFPNLPYYIDGDVKLTQSMAIIRYIADKHNMLGGCPKERAEISMLEGAVLDIRYGVSRIAYSKDFETLKVDFLSKLPEMLKMFEDRLCHKTYLNGDHVTHPDFMLYDALDVVLYMDPMCLDAFPKLVCFKKRIEAIPQIDKYLKSSKYIAWPLQGWQATFGGGDHPPKSDEFVIEKSLNRIKKFWKEAQYEVIEHSSGLKLVREWDVLEQACKEDLEELVSMKASNYYKIFEQDCLDLESKLTKLSEIQVNWVEVQFYWLDLYGILGENLDIQNFLPLETSKFKSLTSEYKMITTRAFQLDTTIEVIHIPNFDTTLKLTIDSLKMIKSSLSTFLERQRRQFPRFYFLGNDDLLKIIGSGKHHDQVSKFMKKMFGSIESIIFLEDFITGVRSVEGEVLNLNEKIELKDSIQAQEWLNILDTEIKLSVFTQFRDCLGQIKDGTDIEVVVSKYIFQAILLSAQVMWTELVEKCLQTNQFSKYWKEVDMKIKGLLDKLNKSSDNVKKKIEALLVEYLHFNNVIGQLKNCSTKEEARLLWAKVQKFYQKNDTLDDLNSVFISQSGYLLQYKFEYIGIPERLIYTPLLLIGFATLTDSLHQKYGGCFFGPAGTGKTETVKAFGQNLGRVVVVFNCDDSFDYQVLSRLLVGITQIGAWGCFDEFNRLDEKVLSAVSANIQQIQNGLQVGKSHITLLEEETPLSPHTAVFITLNPGYNGRSELPENLKKSFREFSMKSPQSGTIAEMILQIMGFEDSKSLASKIVHFLELLSSKCSSMNHYHFGLRTLKGVLRNCSPLISEFGEGEKTVVESLKRVILPSLGDTDELVFKDELSKIFDSAGTPLNSKAIVQCLKDAGQRSGFSMSEEFLKKCMQFYYMQKTQQALILVGKAGCGKTATWKTVIDAMAIFDGHANVVYVIDTKVLTKESLYGSMLKATLEWRDGLFTSILRRVNDDITGTFKNSRIWVVFDSDLDPEYVEAMNSVLDDNKILTLPNGERLPIPPNFRILFETDNLDHTTPATITRCGLLWFSTDVCSISSKIDHLLNKSYEALDNKLSMFELDKLKDLISDSFDMASLTNIFTCSNDLVHILGVRTFNKLETAVQLAVHLISSYRQWFQNLDDKSLKDVITLLIKRSLLYALAGDSTGESQRAFIQTINTYFGHDSQELSDYSTIVIANDKLSFSSFCSEIPSVSLEAHEVMRPDIVIPTIDTIKHEKIFYDLLNSKRGIILCGPPGSGKTMIMNNALRNSSLYDVVGINFSKDTTTEHILSALHRHTNYVTTSKGLTLLPKSDIKNLVLFCDEINLPKLDKYGSQNVVLFLRQLMEKQGFWKTPENKWVTIERIHIVGACNPPTDPGRIPMSERFTRHAAILYLGYPSGKSLSQIYEIYYKAIFKLVPEFRSYTEPFARASVHLYNECKARYSTGLQSHYLFSPRELTRLVRGVYTAINTGPRQTLRSLIRLWAYEAWRIFADRLVGVKEKNSFEQLLYETVDKYLPNQDLGNISSTSLLFSGLLSLDFKEVNKTDLVNFIEERFKTFCDEELEVPMVIHESMVDHILRIDRALKQVQGHMMLIGASRTGKTILTRFVAWLNGLKIVQPKIHRHSNLSDFDMILKKAISDCSLKESRTCLIIDESNILETAFLERMNTLLANADIPDLFQGEEYDKLLNNLRNKTRSLGLLLDTEQELYDWFVGEIAKNLHVVFTICDPTNNKSSAMISSPALFNRCIINWMGDWDTKTMSQVANNMVDVIPMEFTDFIVPEVNKELVFTEPIQTIRDAVVNILIHFDRNFYQKMKVGVNPRSPGYFIDGLRALVKLVTAKYQDLQENQRFVNVGLEKLNESVLKVNELNKTLSISLVKSLTFEKERWLNTTKQFSKTSQELIGNCIISSIYETYFGHLNERERADMLVILKRLLGKFAVKYDVNYRFIDYLVTLDEKMKWLECGLDKNDYFLENMSIVMNSQDAVPFLLDPSSHMITVISNYYGNKTVLLSFLEEGFVKRLENAIRFGSVVIIQDGEFFDPIISRLISREFNHAGNRVTVEIGDHEVDVSGDFKLFIHSCDPSGDIPIFLRSRVRLVHFVTNKESIETRIFDITLTEENAEMQRKREDLIKLNTEYKLKLKNLEKRLLEELNNSQGNMLENDELMVTLNNLKKEAMNIEKKLSESEEFFPQFDNLVEEYSIIGKHSVKIFSMLEKFGQFHWFYGISIGQFLSCFKRVFIKKSRETRAARTRVDEILWLLYQEVYCQFSTALDKKFKMIMAMTMFCLYKFDIESEQYKEAVLTMIGVLSESSDGVPKLTVDTNNDLRYLWDYVTTKSYISALNWFKNEFFVDEWNIADVVANSDNNYFTMASERDVDGTFKLIELAKASKESLKIIPLGSIENLNYAQEEISKSKIEGGWILLQNIQMSLSWVKTYLHKHVEETKAAEEHEKFKMFMTCHLTGDKLPAPLLQRTDRFVYEDIPGILDTVKDLWGSQFFTGKISGVWSVYCTFLLSWFHALITARTRLVPHGFSKKYYFNDCDFQFASVYLENVLATNSTNNIPWAQVRDHIATIVYGGKIDEEKDLEVVAKLCAHVFCGSDNLQIVPGVRIPQPLLQQSEEEERARLTAILSNTIEPADSLSSWLQLPRESILNYERLQAKEVASSTEQLLQEM

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(7XKX_1)}(2) \setminus P_{f(7OKL_1)}(2)|=172\), \(|P_{f(7OKL_1)}(2) \setminus P_{f(7XKX_1)}(2)|=35\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:1100000000000110001010010010011111000110110010101000101101000101101010011100010111101110111011111110001001001110011101000110111101111011001100111101101110110000111110110000110111101011001000110101110011001011011100111010111011100010001111011100111000101001000011111110011010111110111001101111001100111110001111111111010001010111001111001011101110001100110001000101011001001010011100111001010111000000101000101010000110101110100000101001001001110000001111110101000101111101110011011000111100100110000000111111001001101101011010111000
Pair \(Z_2\) Length of longest common subsequence
7XKX_1,7OKL_1 207 4
7XKX_1,4AKI_1 165 4
7OKL_1,4AKI_1 286 5

Newick tree

 
[
	7OKL_1:13.03,
	[
		7XKX_1:82.5,4AKI_1:82.5
	]:53.53
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{660 }{\log_{20} 660}-\frac{128}{\log_{20}128})=153.\)
Status Protein1 Protein2 d d1/2
Query variables 7XKX_1 7OKL_1 193 120.5
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]