In this blog, we will focus on AI development in LegalTech and begin with a brief overview on TAR, CAL, and GenAI, followed by an in-depth view of GenAI and LLMs (Large-Language Models). We will then turn our attention to the potential applications of this burgeoning technology, and close by touching upon the current state of acceptance as it pertains to GenAI-based solutions across the legal services landscape.
TAR, Predictive Coding and CAL
In the legal services industry, AI was first meaningfully referenced as Technology Assisted Review (“TAR”), which leveraged the use of computer algorithms to help identify relevant documents within a data set. These machine-learning algorithms analyze prior document tagging decisions made by human reviewers in conjunction with document content to identify similar documents likely to receive the same document coding. This capability gave rise to the notion that trained algorithms can justifiably expedite the identification of relevant documents, thereby leading to the widely accepted and interchangeable use of the terms “predictive coding” and “TAR”.
Such initial machine learning algorithms, new to legal technology solutions, continued to improve and evolve, soon giving way to a new algorithm that significantly reduced the initial human review component and became widely referred as Continuous Active Learning or “CAL”.
TAR and CAL
The fundamental difference between TAR and CAL is the underlying machine-learning algorithm used to predict which documents are relevant to a case. The choice of algorithm affects how human reviewers will train the system, how resilient the algorithm is against incorrect human tagging, and the practical considerations needed to achieve the best results for your review. As the use of CAL gained popularity over the past 5 to 7 years, it soon became referred to as “TAR 2.0” in the legal community, thereby rendering the initial iteration of predictive coding – which is still widely used today and recognized by courts – as “TAR 1.0”.
How Different Is GenAI Compared to Prior AI in LegalTech?
Generative AI represents a significant leap in Artificial Intelligence development and is drastically different from most previous forms of machine learning, not only in the legal industry but across every industry or aspect of human life. The critical difference between GenAI and previous AI iterations is the capacity to generate completely new content.
Unlike previous models and algorithms, which were powerful in their own right and could quickly interpret, sort and classify data, Generative AI is capable of interacting with human users in a question-and-answer format to produce complete, coherent answers to natural language queries. In this manner, unlike the chat bots of yore, GenAI does not rely on preprogrammed responses, and can easily pull context for answers from various sources available to the system or provide opinions on data outside of the system’s training scope.
Naturally, these differences have given birth to a wide variety of potential applications of Generative AI, such as the ability to generate contract language, create case summaries, or help legal professionals quickly access laws and regulations, all of which were previously unattainable with older natural language learning models.
LLMs - Their Role in Powering GenAI & How They Differ from Existing Algorithms
Generative AI and LLMs (Large-Language Models) tend to quite often be used interchangeably, but it is important to recognize that while these are related concepts, they are not the same thing. Generative AI is a general catch-all term to describe AI systems capable of generating new content, whether it involves audio, video, written text or something else. LLMs, on the other hand, are machine learning models that focus primarily on interpreting, analyzing and composing text-based data. In other words, all LLMs are a form of generative AI, but not all generative AI relies on LLMs. Since the legal field is heavily text-driven, most of the time LLMs will be the only form of generative AI e-discovery practitioners are exposed to, although as the variety of data sources used in data collections expands, it is possible that generative AI driven by other types of media will come to the fore as well. For the purposes of this blog, we will cover only LLMs and text-based aspects of Generative-AI and disregard other types of Generative AI that deal with image or movie generation.
With that out of the way, there are a few key differences between LLMs and previously used models such as LSI and SVM, which are the two technologies that underpinned TAR 1.0 and TAR 2.0 (“CAL”).
The most important difference is the ability to generate new output. Both LSI and SVM models were designed to be simple and effective approaches to classify, label or cluster document sets of varying size using a limited amount of sample documents as training material. Their primary focus is to effectively analyze the characteristics of the data they are provided, and not attempt to interpret the language present in the data in any meaningful way. Instead, these models apply a statistical approach (LSI) or a hyperplane method (SVM) to solve a specific problem (document classification) using pre-coded seeds (data) analyzed by the models.
Conversely, LLMs are trained on an incomparably greater volume of documents, and to a much greater depth, which allows LLMs to understand the context and syntax behind human language in a way that enables the creation of new content that can also mimic the writing style used in a specific industry.
A general overview of the differences in practice are provided below for quick reference:
TAR 1.0
CAL/TAR 2.0
LLM
Breadth & Scope of GenAI Usage Driven by LLM Networks
The increased ability of Generative AI models does present certain challenges. For example, LLMs are computationally intensive, and may require significant IT resources to train with additional data. They also typically require a large amount of data for training and will suffer if trained on insufficiently large data sets – this is one of the instances in which an older model, like LSI or SVM, may be more appropriate to use.
Last but not least, most pre-existing LLMs are trained on general language text and may encounter some degradation in performance if required to evaluate text laden with industry-specific jargon that is absent from the initial training data. This point underscores the importance of noting that this technology is still in its nascency. Nevertheless, the drawbacks of exploring the use of LLMs are currently limited, and those that exist are likely to be resolved with future technological improvements. This combined with the upside of discovering a wide range of new opportunities to leverage AI in LegalTech renders a promising future for the creation of innovative solutions.
Potential Applications of GenAI
Let’s take a look at some examples of new ways in which LLMs can be used to help lawyers with daily activities:
Conclusion
There are very few industries that will remain directly unaffected by GenAI, and fewer yet that will not benefit in some way from this technology. Today, the magnitude of expected disruption can be felt all around us with conference halls buzzing about AI seminars, stock markets pushing shares of companies upwards of 300%, such as NVIDIA, a primary provider of AI-enabling chips, and courts beginning to respond to the use of GenAI in LegalTech.
Electronic discovery is certain to receive a positive impact from the use of Generative AI as well. As the past decade has shown, other automation technologies have become permanent fixtures in many discovery workflows, and GenAI is now primed to augment and/or add to existing processes due to user-friendly and intuitive interfaces that many solutions provide.
At this time, however, there remain certain challenges with using LLMs in the legal field, and AI hallucinations is at the top of the list. It is our belief that most of these issues will be overcome through the use of reinforcement learning and overall general improvement of LLMs over time. Therefore, it is possible that classification workflows (most leverage some form of CAL) will be supplanted by LLM-driven workflows similar to how CAL replaced TAR 1.0 for many users. However, it is important to note that TAR 1.0 is recognized by courts worldwide and commonly used by legal teams. At FRONTEO, for example, our proprietary KIBIT AI engine continues to power technology assisted solutions across all business lines, and although we continue to innovate and evolve KIBIT with cutting edge technology, we anticipate GenAI-based solutions will most likely serve as an additional option for legal teams over the next few years, as opposed to outright replacements for existing, proven solutions ingrained within workflows that are used with great effect.
The massive and rapid changes to the way we execute and deliver legal services will surely make for a bumpy road, but as long as certain precautions are taken, GenAI will significantly improve the practice of law. However, it is important to keep in mind that GenAI is not a miracle solution for every problem and should be viewed as “a” tool, but not “the” tool, and consequently, be treated with the same amount of diligence and care as any other application or technology in your software stack.
Sources:
Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive
https://hai.stanford.edu/news/hallucinating-law-legal-mistakes-large-language-models-are-pervasive
Artificial Intelligence for Lawyers Explained
https://pro.bloomberglaw.com/insights/technology/ai-in-legal-practice-explained/#genAI
TAR vs CAL – A Primer
https://www.iltanet.org/blogs/rachel-mcadams1/2021/04/21/tar-vs-cal-a-primer#:~:text=So%20both%20TAR%20and%20CAL,are%20relevant%20to%20the%20case
A (Very) Long Discussion of Legal Document Summarization Using LLMs
https://www.linkedin.com/pulse/very-long-discussion-legal-document-summarization-using-leonard-park
リーガル業界で既に活用されているテクノロジーアシストレビュー(TAR)や継続的能動学習(CAL)と呼ばれる、”教師データ”が必要な機械学習アルゴリズムを持つ従来型のAIと比較し、昨今注目を集めているのが生成AIのモデルのひとつであるLLM(大規模言語モデル)です。LLMは法的文書作成や社内規定や州・連邦規制条項等を検索できるQ&Aチャットボット、また文書の要約や調査作業の迅速化、複数タグに基づくデータ細分類化による関連資料のレビュー前特定など、応用の可能性が大きく期待されています。
しかしLLMを含む生成AIは、ハルシネーション(事実に基づかない情報を生成すること)などの課題があることから、現時点での活用にはまだ注意が必要です。