Welcome to CSAR -- A Resource for Docking and Scoring Development

2014 Benchmark Exercise - click here for information

2013 Benchmark Exercise - click here

2012 Datasets – Full Release click here

Computational chemists need reliable experimental data. The Community Structure-Activity Resource (CSAR) provides experimental datasets of crystal structures and binding affinities for diverse protein-ligand complexes. Some datasets will be generated in house at Michigan while others will be collected from the literature or deposited by academic labs, national centers, and the pharmaceutical industry.

We aim to provide the highest quality data for a diverse collection of proteins and small molecule ligands. We need input from the community in developing our target priorities. Ideal targets will have many high-quality crystal structures (apo and 10-20 bound to diverse ligands) and affinity data for ≥25 compounds that range in size, scaffold, and logP. It is best if the ligand set has several congeneric series that span a broad range of affinity, with low nanomolar to mid-micromolar being most desirable. We prefer Kd data over Ki data over IC50 data (no % activity data). We will determine solubility, pKa, logP/logD data for the ligands whenever possible. We have augmented some donated IC50 data by determining Kon/Koff and ITC data.

CSAR is funded by a U01 grant from the National Institute of General Medical Sciences. The original RFA can be found at http://grants.nih.gov/grants/guide/rfa-files/RFA-GM-08-008.html. Press releases about CSAR can be found at:

Why should my company donate proprietary data to the public domain? Computational techniques are very successful at enriching hit rates when identifying sets of compounds for experimental testing. However, it is not possible to reliably rank nanomolar-level compounds over those with micromolar affinities. By donating data, it outsources the development of better tools. Pharma has the data, but not the time, to develop improved tools. Second, you have nothing to lose because we are asking for “old” data. Abandoned projects have the kind of data we need, and some could be donated without compromising a company’s competitive advantage on current projects. Third, participation in CSAR can provide visibility in the field. In particular, the donated data could be used to conduct a community-wide blind evaluation of docking and scoring methods. Lastly, there may be a possible financial benefit. Data has value, and it might be possible for the company to declare a charitable donation (of course, this requires consultation with the company’s legal and accounting teams). Our first dataset has been contributed by Abbott (urokinase), and we have reached a legal agreement with GSK to obtain data. We are working with scientists at BMS, Vertex, Pfizer, Merck, Genentech, and Eli Lilly to identify possible depositions. For the community to improve our approaches, we need exceptional datasets to train scoring functions and develop new docking algorithms. That is the goal of the CSAR project.



The CSAR 2014 Benchmark Exercise has begun. Go to the 2014 Benchmark page to download the files. Phase 1 will end on Fri May 2nd. Also, we have added 123 new PDB entries to the HiQ set, go to “Download Datasets” in the left panel in the Datasets block to get the set or go to “Browse Datasets” to see what is there.

We have updated the HiQ set with 123 new structures from the PDB from the years 2009 to 2011. These have been setup in a similar fashion to the original HiQ set. See the Download Datasets area.

Special issue of JCIM to be devoted to 2011-2012 CSAR Benchmark Exercise is due out shortly.


2013 Benchmark exercise:

the answers to Phase 1 are: DIG5, DIG10, DIG18, and DIG19
the answers to Phase 2 are: closest to actual ligand bound is DIG18_25 and DIG20_39

CSAR 2013 Benchmark exercise: Phase 3 has been begun. Go to the 2013 Benchmark Exercise to download files. Phase 3 ends 30 Aug 2013.

LogD at pH7.4 has been measured for all the synthesized CSAR compounds. The 2012 download page has a copy of the Excel spreadsheet which contains all the measured physical properties in one spreadsheet.

CSAR 2013 Benchmark exercise: Phase 2 has been begun. Go to the 2013 Benchmark Exercise to download files. Phase 2 ends 31 May 2013.

CSAR 2013 Benchmark exercise has been begun. Go to the 2013 Benchmark Exercise to download files.

CSAR 2013 Benchmark exercise to start March 25th 2013. On the left hand sidebar click on the 2013 Benchmark Exercise for more details.