CoV2D BrowserTM

CoV2D project home | Random page
Parikh vectors
6FBV_1 1YOJ_1 2VWI_1 Letter Amino acid
23 19 23 A Alanine
9 14 13 Q Glutamine
15 16 30 K Lycine
6 3 8 H Histidine
26 10 19 S Serine
24 17 14 T Threonine
1 7 4 W Tryptophan
9 12 9 Y Tyrosine
18 17 10 R Arginine
37 30 29 L Leucine
3 10 10 M Methionine
8 10 10 F Phenylalanine
20 17 12 P Proline
20 10 19 I Isoleucine
34 19 21 V Valine
10 8 8 N Asparagine
25 10 16 D Aspartic acid
1 6 5 C Cysteine
31 28 23 E Glutamic acid
27 20 20 G Glycine

6FBV_1|Chains A, B|DNA-directed RNA polymerase subunit alpha|Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) (83332)
>1YOJ_1|Chains A, B|proto-oncogene tyrosine-protein kinase SRC|Homo sapiens (9606)
>2VWI_1|Chains A, B, C, D|SERINE/THREONINE-PROTEIN KINASE OSR1|HOMO SAPIENS (9606)
Protein code \(c\) LZ-complexity \(\mathrm{LZ}(w)\) Length \(n=|w|\) \(\frac{\mathrm{LZ}(w)}{n /\log_{20} n}\) \(p_w(1)\) \(p_w(2)\) \(p_w(3)\) Sequence \(w=f(c)\)
6FBV , Knot 146 347 0.82 40 188 338
MLLSQRPTLSEDVLTDNRSQFVIEPLEPGFGYTLGNSLRRTLLSSIPGAAVTSIRIDGVLHEFTTVPGVKEDVTEIILNLKSLVVSSEEDEPVTMYLRKQGPGEVTAGDIVPPAGVTVHNPGMHIATLNDKGKLEVELVVERGRGYVPAVQNRASGAEIGRIPVDSIYSPVLKVTYKVDATRVEQRTDFDKLILDVETKNSISPRDALASAGKTLVELFGLARELNVEAEGIEIGPSPAEADHIASFALPIDDLDLTVRSYNCLKREGVHTVGELVARTESDLLDIRNFGQKSIDEVKIKLHQLGLSLKDSPPSFDPSEVAGYDVATGTWSTEGAYDEQDYAETEQL
1YOJ , Knot 131 283 0.87 40 195 272
QTQGLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMNKGSLLDFLKGETGKYLRLPQLVDMSAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGLARLIEDNEWTARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMVNREVLDQVERGYRMPCPPECPESLHDLMCQCWRKEPEERPTFEYLQAFLEDYFTSTEPQYQPGENL
2VWI , Knot 134 303 0.84 40 193 294
MSEDSSALPWSINRDDYELQEVIGSGATAVVQAAYCAPKKEKVAIKRINLEKCQTSMDELLKEIQAMSQCHHPNIVSYYTSFVVKDELWLVMKLLSGGSVLDIIKHIVAKGEHKSGVLDESTIATILREVLEGLEYLHKNGQIHRDVKAGNILLGEDGSVQIADFGVSAFLATGGDITRNKVRKTFVGTPCWMAPEVMEQVRGYDFKADIWSFGITAIELATGAAPYHKYPPMKVLMLTLQNDPPSLETGVQDKEMLKKYGKSFRKMISLCLQKDPEKRPTAAELLRHKFFQKAKNKEFLQEK

Let \(P_w(n)\) be the set of distinct subwords (intervals) in a word \(w\). Let \(p_w(n)\) be the cardinality of \(P_w(n)\). Let \(f(c)\) be the sequence in FASTA with 4-symbol Protein Data Bank code \(c\).

\(|P_{f(6FBV_1)}(2) \setminus P_{f(1YOJ_1)}(2)|=76\), \(|P_{f(1YOJ_1)}(2) \setminus P_{f(6FBV_1)}(2)|=83\). Let \( Z_k(x,y)=|P_x(k)\setminus P_y(k)|+|P_y(k)\setminus P_x(k)| \) be a LZ76 style (set of subwords) Jaccard distance numerator for \(P(k)\).Hydrophobic-polar version of Sequence 1:11100010100011000000111011011110011001000110011111100101011100100111100010011101001110000001101010001110101101111111010011101101000101010111001010111100010110110111001001110100010100100000100111010000010100111011001101111100101010110111011010011011111001010100000100011001101110000011010011000100101010011101000110101001110011010100011000000100001
Pair \(Z_2\) Length of longest common subsequence
6FBV_1,1YOJ_1 159 5
6FBV_1,2VWI_1 149 3
1YOJ_1,2VWI_1 162 5

Newick tree

 
[
	1YOJ_1:82.08,
	[
		6FBV_1:74.5,2VWI_1:74.5
	]:7.58
]

Let d be the Otu--Sayood distance d.
Let d1 be the Otu--Sayood distance d1. (This makes the 4TYN sequence AAAAAA a close match...)
A roughly speaking expected distance is \((0.85)(0.8)(\frac{630 }{\log_{20} 630}-\frac{283}{\log_{20}283})=96.9\)
Status Protein1 Protein2 d d1/2
Query variables 6FBV_1 1YOJ_1 120 110
Was not able to put for d
Was not able to put for d1

In notation analogous to [Theorem 16, Kjos-Hanssen, Niraula and Yoon (2022)],
\[ \delta= \alpha \mathrm{min} + (1-\alpha) \mathrm{max}= \begin{cases} d &\alpha=0,\\ d_1/2 &\alpha=1/2 \end{cases} \]