Best Practices in Information Retrieval and Records Management: Analysis and Recommendations from the 2007 Sedona Conference
By Steven Essig
The Sedona Conference Journal, Volume 8, Fall 2007, includes much relevant commentary on possible best practices and other important concerns on effective information retrieval of legal documents. Issues raised range from effective precision and recall searching, appropriate sorts of indexing strategies, word choice, email retention policies for courts and other legal organizations among other major concerns. Of particular interest to librarians should be the section of the issue entitled “ESI Symposium”, which contains a report from “The Sedona Conference ® Working Group on Best Practices for Document Retention and Production (WG1), Search & Retrieval Sciences Special Project Team” (the August 2007 Public Comment Version).
Based on the premise that the explosion in the volume of electronic information makes traditional search approaches “no longer practical or financially feasible”, this report confronts the inability of human “natural language” approaches, of manual review or simple keyword searching to fully access the wealth of possible legal information available. The authors posit that information science, linguistics and other disciplines have much to teach us in developing more effective and comprehensive information retrieval processes. Among the search tools that could usefully supplement Boolean logic and other forms of traditional “keyword searching” are “fuzzy logic” which more effectively captures “variations on words”, “conceptual searching” based on taxonomies and ontologies as well as other tools “that employ mathematical probabilities”. In addition, information science metrics such as “precision” and “recall”used to measure the effectiveness of various forms of information retrieval are also judged worthy of future study.
After exploring the strengths and weaknesses of these various methods, the report’s authors go on to present 8 “Practice Points” that might usefully inform the evaluation of various search and retrieval methods. Among the major conclusions are these:
1. It is “infeasible or unwarranted” to rely solely on a manual review to obtain “responsive documents.” Automated search methods are also vital.
2. Before employing any of these automated methods, substantial “human input” is necessary. For one thing, the applicable “universe” of relevant documents must be carefully defined.
3. Before choosing a specific search and retrieval method, the specific legal context must also be
clearly understood. For instance, is “precision” or “recall” more important? Is the goal to find the highest possible number of responsive documents or is “efficiency” more crucial?
4. Legal research practitioners must ask careful and well-considered questions of product vendors concerning the possibilities of the tool, administrative and licensing issues etc.
5. There are no “perfect searches”; differing search methods will produce differing results.
6. Various parties involved in a case should seek to collaborate “on the use of particular information search and retrieval methods, tools and protocols”.
7. The various counsels on the case should be prepared to explain their search methods in “subsequent legal contexts” (e.g depositions, evidentiary proceedings, and trials).
8. Each of the parties, as well as the courts, need to stay alert to newer information search tools and technologies.
The article concludes with 2 major recommendations:
1. “The legal community should support collaborative research with the scientific and academic sectors aimed at establishing the efficacy of a range of automated search and information retrieval methods” and
2. “The legal community should encourage the establishment of objective benchmarking criteria, for use in assisting lawyers in evaluating the competitive legal and regulatory search and retrieval services market.”
There then follows an Appendix demonstrating and describing in more detail specific types of search models, such as Boolean searching, “probabilistic” Bayesian classifiers, “fuzzy searching”, statistical “clustering”, semantic representation, categorization tools such as t thesauri, ontologies, and taxonomies, and various presentation and visualization tools.
The next article in this section: “Search and Information Retrieval Science” by Herbert L. Roitblat, further focuses on the issues of precision and recall. Roitblat presents a fairly detailed examination of various forms of text analysis, such as the “vector space model”, weighting of terms, “query expansion”, Syntactic techniques, the setup of user interfaces, and others. The article concludes with a very brief discussion of alternatives to precision and recall methods.
The next Sedona Conference commentary provides “Guidelines For the Selection of Retention Policy”. There are 4 guidelines on email policy development, followed by a framework disclosing retention considerations, mailbox and storage options, and the need to be careful with “litigation holds”. There is then a brief discussion of the issues, followed by a concluding commentary.