Introducing the Generative AI Module for Efficient Document Analysis and Review

Project Summary:

A generative AI module was to be developed to reduce the time for legal teams to get summaries, important phrases & keywords, check biases & ensure the document has all the legal clauses while being a chat interface.



1. Make a knowledge base suitable for all types of legal documents & parsing the formatted structure of legal documents.


Technical Challenges:

1. While parsing formatted documents, overfitting of similar structures was observed & to overcome these vectors of each repeated section were done resulting in better quality.

2. Data security & anonymity were one of the key factors for clients & to ensure that we modified vectored documents to not reveal names unless authorization was provided by admins.

3. Scanned pages in the form of images in English only restriction was to be developed as a user can upload any form of image & can ask to run an algorithm on top of it.


The Solution That We Proposed:

A web application with a chat interface where a user can upload Word docs, PDF files & scanned images of the documents & then get the results as needed. Pre-built prompts were assigned to each task while the user had the choice to enter their prompts to generate the results.


OCR was to be done on the scanned images to then extract text, tokenize it & send it as input tokens & knowledge base for the GenAI model. Tokenization of Word doc & pdf files was done directly. Answers to all the questions related to legal documents were to be given within 1-2 seconds & analyzing the document was to be done within 8-10 seconds despite the size.


Measurable Benefits Post Implementation:

The idea of products using GenAI in Legal space is considered a new adaptation. Law firm for which this GenAI was developed observed below mentioned benefits:


1. Quick summary & keyword extraction of past documents.

2. The process of understanding & knowledge sharing became easy.

3. Easy bias detection from 100s of pages.

4. Flags any non-suitable piece of text from the said document.

5. Production boost of 14% on initial implementation. Resulted in an increase in revenue.