XenoSite Output

XenoSite provides prediction output in TSV format consisting of 7 columns. The first column is the molecule title (or a number corresponding to the order in which the molecules were input if titles are not provided in the SDF or SMILES). The second column contains the atom number, using the numbering implied by the order of atoms in the input file. The third column contains numerical labels which group atoms that have common topology or are part of the same reaction. The fourth column gives the probability of metabolism as predicted by XenoSite. The fifth column contains the atom type. The sixth column is a boolean value indicating whether the atomic site is annotated as a known SOM from the literature sources used in training the XenoSite predictive models. The final column is the background probability of observing a SOM given the model, which can be used to interpret the prediction.

Model Molecule Atom Group Atom Type Prediction Observed Site Background 1A2 Vortioxetine 1 1 C3 0.07523750779849031 0 0.12157598499061914 1A2 Vortioxetine 2 2 Car 0.00358783317872466 0 0.12157598499061914 ...

XenoSite also provides visual output for each molecule and each CYP isozyme. Potential SOMs are labeled by a color gradient.

Images of individual molecules with SOM predictions can be downloaded in PNG or Portable Document (PDF) Format. If a molecule is annotated with known SOMs, these sites are circled.

Performance Comparison

The following performance results use the Top-2 metric, which is the percentage of molecules whose top 2 predicted sites include at least one true SOM. These scores are generated using Leave-One-Out cross-validation.

isozyme 1A22A62B62C82C92C192D62E13A4HLM 
number of substrates271105151142226218270145475680average
XenoSite [1]87.185.783.488.786.789.088.583.587.689.487.0
RS-Predictor [4]83.485.782.183.884.586.285.982.882.386.284.3
SMARTCyp [2] 82.1
MetaSite [5]        77.41 a [6] 75.8
StarDrop [7]    78.0 75.3 74.1 75.8
Schrödinger [8]    72.1 68.1 76.4 72.2
random model26.031.924.822.622.
  • a This result was obtained from the cited work and was performed on an earlier version of the 3A4 dataset including only 394 of the 475 substrates. We would like to give updated scores for comparison to MetaSite across all the models, but they have explicitly forbidden scientific comparisons such as these in their end-user agreement.


  • [1] Zaretzki, J., Matlock, M., & Swamidass, S. J. (2013). XenoSite: Accurately predicting CYP-mediated sites of metabolism with neural networks. Journal of chemical information and modeling, 53(12), 3373-3383.
  • [2] Rydberg, P., Gloriam, D. E., Zaretzki, J., Breneman, C., & Olsen, L. (2010). SMARTCyp: A 2D method for prediction of cytochrome P450-mediated drug metabolism. ACS Medicinal Chemistry Letters, 1(3), 96-100.
  • [3] Swamidass, S. J., Azencott, C. A., Lin, T. W., Gramajo, H., Tsai, S. C., & Baldi, P. (2009). Influence relevance voting: an accurate and interpretable virtual high throughput screening method. Journal of chemical information and modeling, 49(4), 756-766.
  • [4] Zaretzki, J., Rydberg, P., Bergeron, C., Bennett, K. P., Olsen, L., & Breneman, C. M. (2012). RS-Predictor models augmented with SMARTCyp reactivities: robust metabolic regioselectivity predictions for nine CYP isozymes. Journal of chemical information and modeling, 52(6), 1637-1659.
  • [5] http://www.moldiscovery.com/docs/metasite/
  • [6] Zarelzki, J. M. (2011). Rs-predictor---creation of cytochrome p450 regioselectivity models.
  • [7] StarDrop, version 4.2.1; Optibrium Ltd.: Cambridge, United Kingdom. 2009.
  • [8] Schrodinger, Portland, OR, USA. http://www.schrodinger.com/