Law and the Meaning of Data Science
In conversations about the transformation of the legal industry, we cannot escape discussions about the role of data and data science. When used correctly, data can illuminate the past, inspire the present and influence the future.
But what is data, more specifically how do we define data? How do you acquire data? What is data science? And how does it connect to the practice of law?
Big data is a combined term that describes both the staging and the analysis of massive amounts of data. Big data in the legal field can include all the text from e-discovery; all the emails generated by a given e-mail domain; social media conversations; contracts and bills of every state in the country; court rulings from every U.S. circuit; and more, much more.
Data scientists have had their profiles raised in the past five years, but the science of analyzing data to predict outcomes or reveal hidden patterns is not new. As far back as 2011, McKinsey & Company produced an extensive report on big data in the legal sector which included both a description of the science itself as well as its application in law firms and corporations.
As more of us realize the power of aligning data with human endeavor, we learn how data science can assist us in many areas, including:
How do you get great insights from data? You work with people who are data and analytics professionals. More broadly, we need a mix of professionals and educators who have a blues brother-esque "we’re on a mission" mentality about working with data in the field of law.
That said, you can do data analysis without being a data scientist. You can use the tools and services of others to leverage the value of your own data collection efforts . But, in the end, having a data scientist on your team can prove to be vital.
What are the qualifications you want to look for in a data scientist?
Most law firms and corporate legal departments don’t have a full-time staff data scientist. But they do have analysts, programmers and other technical staff who can help harvest the data and provide useful insights to lawyers.
Also, there are now numerous companies that sell robust analytics tools to firms and corporations. Examples include LexMachina, Premonition and Ravel.ai. These software and service solutions come with their own data scientists – at least in the case of LexMachina – so that you can hit the ground running with your data analytics needs.
One of the clearest cases of how a data scientist can impact the industry is when they work with courts.
According to court records transparency advocate Harris County (TX) District Clerk Chris Daniel, many courts are looking for ways to put their digital data to work to improve the processes and outcomes associated with the handling of cases. In particular, many courts are interested in the backlog of cases that have yet to be adjudicated and how data can help them make better decisions about managing those cases.
My colleague Shawn Henson has written about Harris County’s efforts in this regard in The Legal Industry Just Got a Useful New Tool To Analyze Data. In general, courts appreciate additional resources, such as software and the analysis of their data, that can have a physical impact on their dockets. As courts seek to find more efficiencies, data scientists will be there to help.

Using Predictive Analytics in Legal Matters
Given the complexity of the law and the ever-increasing data available, evaluating litigation risk can be difficult. Predictive analytics, or data science, allows firms to gain valuable insights into their cases by adding statistical rigor to the legal decision-making process. Of course, successful lawyers have always used their instincts and experience to try and forecast case outcomes. But adding predictive analytics to the mix can help every lawyer cut through the noise and achieve better outcomes for clients. Predictive analytics is an extension of evidence-based law, which has long been a part of the practice of law.
Put simply, evidence-based law applies scientific research to clinical questions that arise in the law, much in the same way evidence-based medicine elevates the practice of medicine. Evidence-based law helps practitioners to determine the right course of action by parsing data and including the evidence arising from that data in the decision-making process. The power of predictive analytics in law comes from combining evidence-based law with powerful computer systems and statistical analysis of large amounts of data. For example, companies and law firms are increasingly relying on judicial analytics programs to make better decisions regarding litigation strategy, settlement, and budgeting. By collecting and analyzing huge volumes of court records, such as pleadings and court rulings, law firms can now divvy up their areas of practice by judges, or that judge’s law clerks, to better understand their proclivities and what issues resonate with them. Moreover, courts are also relying on these analytics programs to help them determine, for example, the best times to schedule trials or how to allocate scarce judicial resources.
While the application of predictive analytics to litigation is just beginning to take hold, it is having a profound impact, allowing law firms and corporations to reinvent their litigation portfolios with a greater understanding of their various risks and potential rewards. From predicting systemic risk in portfolios of matters, to spotting trends in past judicial rulings, predictive analytics is having a transformative impact on how to navigate the case triage process as well as develop and maintain legal strategy. It enables law firms and clients to look at vast amounts of information and find patterns in those data spaces that do not jump out as obvious solutions.
Data Science Programs for Attorneys
Multitudes of tools are available for legal professionals, ranging from data visualizers, and workflow applications, to SQL (structured query language) databases and programming languages such as R, Python, or often a combination of the three. The best tool depends on firm size, resources, budget, and area of practice.
For example, Microsoft Excel and Tableau – a visualization application based on Excel – are frequently used by legal professionals to present data and create sophisticated graphics. However, Legal described how in-house counsel combined Excel, Tableau, and machine learning to create compelling visualizations for judges and adversaries.
In addition, Tableau offers a free desktop app with limited functionality that allows users to visualize data quickly and easily. Tableau version 9 and later have many more features and are readily available in the cloud.
Business Intelligence (BI) tools are integrated applications that allow business users to translate data into actionable information. Incorporating a wide range of features and functions, these tools serve as a one-stop-shop to import, export, and analyze data. Some of the most popular BI tools include Microsoft Power BI, Tableau, Domo, Sisense, Microstrategy, SAP Business Objects, and Qlik. Many BI tools can be used on either a standalone basis or integrated directly with Microsoft Excel for deep dives of Excel worksheets or spreadsheets.
There also are software programs specifically designed for attorneys and law firms to manage document assets, such as transcripts, decisions, facts, legal research results, and designations. Sophisticated software applications such as Relativity and Everlaw are designed for large and midsize firms, with fees starting at $450 per month, per user, and with monthly billing options available.
Many of these applications are available for download in well-known app stores such as Google Play Store or Apple’s App Store. In addition, the proliferation of free software accessible via the internet is large and growing. Statista.com listed 90 open source and downloadable tools. Internet search engines will return a complete listing of these software programs.
These tools simplify and automate the process for organizing documents by type and descriptor, and for searching through millions of documents quickly for specific words or phrases. These applications may even take advantage of advanced software applications that measure and predict litigation outcomes using large data sets.
Legal research applications have offered predictive analysis for nearly a decade. In fact, Bloomberg recently announced that it will integrate all of its forecasting and analytics products into the Bloomberg Law platform. Other applications also are widely available for predictive analytics, including Lex Machina (LexisNexis); Premonition (Ravel Law); Lexis Analytics; and Ravel’s Judicial Analytics.
These applications are available in a variety of legal practice areas, from patents to antitrust to family law, and offer access to large databases of prior cases and trends that help lawyers deliver accurate assessments of a particular case or their client’s potential position in litigation.
The development of data analytics tools will continue to affect the legal profession. As recently as 2014, there were no legal analytics companies offering only legal-focused products to lawyers; however, now there are several data analytics companies that specialize in law. As predicted by The Economist, some 3.5 quintillion bytes of data will be created each day by 2020.
Data science and analytics tools can help attorneys access, analyze, and use this volume of data through a powerful combination of computer science and statistics to extract value from data and to use complex data sets to develop strategies and to visualize and communicate insights.
The Development of Legal Research Through Data
For decades, legal research has been synonymous with slogging through case law and following long trails of legal precedent. It was an inexact science, much like a complex crossword puzzle adding more words as the vocabulary of the language expanded and new cases were added to the corpus of legal knowledge. Enter big data and machine learning. Technology now helps lawyers automate legal research by compiling large amounts of data gathered from various sources into large repositories that a lawyer can use to complete his or her research project . Using algorithms that connect legal concepts and their instances in statutes, case law, and third-party legal analysis, data science takes digital legal research from arcane to advanced.
Legal researchers can use technology to compile and parse thousands of cases in seconds and weeks instead of the months that it would have taken years ago. In their rush to bring their products to the market, many firms often overlook compliance concerns that can create nightmares down the road when the full project has been completed. Heed the warning—the temptation to plow ahead with the project without a clear conception of the ethics and compliance concerns can undermine their entire project.
Obstacles to Data Science in the Legal System
As is the case with any new technology businesses seek to implement, there are potential pitfalls of naivety and inexperience. These may be even more problematic in the field of law given the amount of publicity surrounding many other industries with data science and possible self-interested parties on both sides of the fence. Regarding data science in law, there are several issues facing adoption of new technology, including: The evolving uses of data science and machine learning are truly revolutionary, but they are not without drawbacks. The examples above are not exhaustive, but they do highlight the fact that lawyers need to understand the state of the art before implementing it in a real-world environment. The solution, as with almost all issues in the technology space, is education combined with a prudent approach to implementation.
The Future of Legal Data Science
The going in assumption is that (1) things will continue to become more complex as more and more corporations enter the market as "legal tech" firms (and some of the existing firms merge into other entities); (2) data science and AI will continue to improve; and (3) that a significant portion of a corporate legal department’s budget will continue to move toward better, more efficient technology that helps them monitor their outside spend. There’s little point in trying to guess the future (particularly when considering the pace of change and the many variables that could influence things), so I’ll just sketch a few general areas that administrators at law firms and legal departments would be wise to pay attention to in the next 3-5 years.
The sheer number of systems will increase. Right now, the number of basic tools that even large firms may have is dwarfed by most sophisticated corporate law departments; The difference is that the corporations understand how to use data science much better than most law firms .
Potential business models will include SaaS offerings, a reliance on standard interfaces and APIs, and evolving standards for legal data and exchanged files.
Much of what could happen is going to depend on the quality of your existing data, the data that other groups (opposing counsel, corporations increasingly internalizing legal functions) publish or make available, and the amount of creative energy that is focused on the legal sector rather than on domain specific areas.
Some things are already happening that are gradually improving the maturity of legal functions in the corporate sector. For example, IBM Watson’s roll-out of Watson Legal is an attempt to shift the value of AI from billing and other performance calculators to something more directly useful to corporate clients.
So the bottom line is that we’re living in one of the most challenging and interesting times for those involved in legal services.