By Craig Macaulay | 8 March 2018 | Forensic Blog
A ground-breaking U.S. case has added some useful guidance on the ‘grey’ areas of the best methods to identify relevant documents in discovery and how to validate the accuracy of the results obtained using the Technology Assisted Review ('TAR') discovery process.
The orders, issued by renowned TAR pioneer Special Master Maura Grossman and Magistrate Judge Jeffrey Gilbert in a large and complex antitrust class action suit in the Northern District of Illinois, described how the parties are to go about:
  • Increasing transparency by disclosing the source of documents and recording any pre-search deduplication and culling of collected data
  • Conducting keyword searches and machine learning-based TAR searches, and disclosing the details of the methodology to the opposing party, and
  • Validating the results of the search process and demonstrating that an acceptably low number of documents were incorrectly classified as irrelevant to Requests for Production ('RFPs') and excluded from the set of documents provided.
The validation protocol section is particularly interesting, as it is relevant to any electronic document collection and search process, whether using machine learning or more traditional methods.
During discovery, the producing party uses either TAR or a manual review process to classify documents as either ‘responsive’ (relevant) or ‘non-responsive’ (not relevant) to a RFP. In either case, the results should be validated, by selecting a random sample of documents (the 'Validation Sample') and manually checking whether each one was classified correctly. The way that this sample is selected is an important consideration. In this case, the order stated that the Validation Sample was to be selected as follows:

The orders also stated that, if a manual review process was used instead of TAR, a similar Valuation Sample should be selected, made up of 500 documents classified as ‘responsive’ and 2,500 documents classified as ‘non-responsive’.
The order then discussed the relevance of an appropriate sample size.
Importantly, the order specified that the Validation Sample shall be reviewed and coded by a subject matter expert ('SME') who is knowledgeable about the subject matter of the litigation.  This should be an attorney who is familiar with the RFPs and the issues in the case.  During the course of the review of the Validation Sample, the SME shall not be provided with any information concerning the Subcollection or Subsample from which any document was derived or the prior coding of any of the documents being reviewed.  The intent of this requirement is to ensure that the review of the Validation Sample is blind; however, it does not preclude a Party from selecting as a SME an attorney who may have had prior involvement in the original review process.
Finally, the orders require that the producing party produces copies of the relevant non-privileged documents to the requesting party, along with detailed statistics, a recall estimate, and a table with the following details:
  1. Document number
  2. The subsample from which the document came
  3. SME’s coding
  4. SME’s privilege coding 


Validation of any document review, but in particular where TAR has been employed, is very important. This was an issue for Justice Vickery in the second McConnell Dowell decision. In that case, his Honour decided not to adopt the recommendation of the special referee, who had recommended that no validation be undertaken. Instead, he required that the parties perform validation (note that a validation round was agreed to in the original TAR protocol). 
When using a Simple Active Learning ('SAL') protocol, as was the case in McConnell Dowell, the validation approach is a well-documented process. Where a Continuous Active Learning ('CAL') protocol is implemented, the approach to adopt for validation is less clear. The method described in this order will allow for a reasonably transparent method of validating all forms of TAR and linear review approaches going forward. It will be interesting to see how widely it is adopted given the cost of dealing with a SME (by our estimate, approximately $20,000 to $25,000).
The order in this case was a big step forward. The validation protocol mandated in the order adds robustness and transparency to the process, providing a practical solution for a difficult problem.