LAWBOT
LAWBOT
Abin Eldho1*, Adith Arun1, Mohamed Bilal k h1, Rohit Jaison1, Sheena kurian K1
1
Dept. of Computer Science and Engineering, KMEA Engineering College, Ernakulam, India
Email:[email protected],[email protected],[email protected],rohit
[email protected], [email protected]
ABSTRACT
The LawBot is a groundbreaking initiative that utilizes the power of artificial intelligence (AI) to
bridge the gap between legal knowledge and accessibility, specifically within the intricate realm
of Indian law and ethics. This ambitious project entails the meticulous development of an AI
chatbot, extensively trained on the Indian Constitution, the laws of India, and intricate ethical
dilemmas. The chatbot boosts a diverse array of features, beginning with its ability to predict case
outcomes. Users can input comprehensive case details to receive predictions about the likely
legal verdicts, empowering both individuals and legal professionals to make well-informed
decisions. Moreover, the chatbot facilitates legal information retrieval with unparalleled ease.
Users can effortlessly request information on specific Indian laws by providing the law’s name or
section number. The LawBot's capacity to deliver relevant legal information is what makes it
unique. It also has the ability to provide concise and understandable explanations, guaranteeing
that legal information is available to everyone who seeks it out. Beyond its practical legal
applications, the AI model is also trained to navigate the complexities of ethical and moral
dilemmas. Complementing its extensive capabilities, the LawBot boasts a user-friendly interface
that caters to a wide spectrum of users. By seamlessly merging AI capabilities with an expansive
knowledge base, this project endeavors to empower individuals and elevate legal literacy across
India, ultimately contributing to the advancement of a more enlightened and just society.
1 INTRODUCTION
LawBot provides a revolutionary answer to the pressing need for easily available and
knowledgeable legal guidance in the ever-changing world of quickly developing technology and
constantly changing legal frameworks. Carefully crafted to shed light on the complex areas
covered by the Indian Constitution, LawBot combines state-of-the-art technology with a deep
understanding of law, and is set to transform the way legal knowledge is accessed and understood.
This revolutionary platform is a beacon for people, companies, and society as a whole. It consists
of two unique chatbots : LawBot Info and LawBot Predict.
LawBot Info is armed with a vast collection of legal datasets covering the Indian Penal Code
(IPC), the Criminal Procedure Code (CrPC), the Constitution of India (COI), and the Civil
Procedure Code (CPC), making it a beacon for legal understanding and accessibility. Its strength is
in its user-friendly design, which enables users to easily traverse the complex legal environment.
Users can quickly and easily access a wealth of specialized legal information by answering
prompts or questions. For example, typing "IPC section 302" into the LawBot Info’s text input
field will cause it to gather and display pertinent information and facts about IPC Section 302. This
feature provides access to accurate and thorough legal knowledge beyond the simple retrieval of
facts. The seamless retrieval mechanism will be valuable for researchers, legal professionals,
students, and the general public alike, enhancing their capacity for detailed research and
understanding of legal content.
LawBot Predict will be trained and tested using the Indian Legal Document Corpus (ILDC)
dataset. The Supreme Court of India (SCI) case proceedings are included in ILDC, together with
original court rulings that provide context. Gold standard judgment decision explanations from
legal professionals are also annotated into a component of ILDC that is classified as a separate test
set. This set is used as an assessment tool to determine how well judgment prediction algorithms
explain things in-depth. A substantial backlog of court cases slows down the legal system in
populous nations like India, frequently as a result of issues like a lack of qualified judges.
Consequently, it may be possible to speed up the legal system by developing a system that can
advise judges on potential outcomes in cases that are currently pending. Nonetheless, an automated
decision system needs to be well-explained in language that people can comprehend, in order for it
to be accepted in court. Therefore, it becomes necessary to not only anticipate the outcome of a
court case but also to explain the reasoning for that outcome.
2 RELATED WORKS
Dr. Mrs. Neeta A. Deshpande [1] explores the development of a medical chatbot using Natural
Language Processing (NLP) for health-related queries. The system employs the Support Vector
Machine (SVM) algorithm for disease prediction, integrates NLP for understanding user queries,
and utilizes word order similarity for analyzing sentence structure. Comparative analysis with
Naïve Bayes and KNN methods reveals the superior accuracy of SVM, particularly beneficial for
medical institutions. Leveraging a large dataset for enhanced performance, the chatbot predicts
diseases based on symptoms and proposes future integration of voice and face recognition
technologies for deeper patient interactions, enriching the user experience.
Myung Sun Baek et.al [2] introduces a cutting-edge smart policing system utilizing machine
learning to predict crime types and risk scores through the analysis of text-based criminal case
summaries. Implemented as a user-friendly GUI-based platform, it empowers field personnel with
rapid identification of crime types and risk assessment. The system's superiority over traditional
algorithms is validated through performance evaluations. The methodology involves constructing a
keyword dictionary, curating datasets, and developing prediction models. Real-time capabilities
are emphasized, achieved through the GUI application platform, showcasing deep learning's
versatility in addressing real-world challenges in crime prediction and risk assessment.
D.Nagamallika et.al [3] introduces a criminal identification system leveraging deep learning
algorithms, featuring MTCNN for face detection, FaceNet for embedding, and OpenCV for
image/video processing. Achieving an 86% accuracy rate, the system locates and matches criminal
faces, extracting data from a database to alert law enforcement. Emphasizing the system's role in
efficient identification, the procedure details algorithm used for face detection, with future
improvements suggested. Continuous adaptation to evolving technologies is stressed, highlighting
the system's potential impact on law enforcement and paving the way for advancements in
criminal identification.
Junyun Cui et.al [4] performs a comprehensive survey on Legal Judgment Prediction (LJP)
by employing an exhaustive analysis of 31 datasets across six languages, evaluating metrics and
models. The process includes categorizing LJP tasks, legal systems, and law domains,
emphasizing the need for additional datasets for specific tasks. It involves machine and expert-
driven metadata extraction, annotating rationale sentences, and categorizing datasets. It explores
the use of pre-trained language models, multi-language corpora, and diverse learning frameworks.
It presents performance metrics for various NLP models, offering insights, recommendations, and
proposing future research directions, addressing challenges like legal reasoning and interpretability
in LJP tasks.
Varun Mandalapu et.al [5]: comprehensively assesses over 150 articles on crime prediction
through machine learning and deep learning. The analysis focuses on 51 selected articles,
exploring diverse algorithms and datasets. They employed word cloud analysis, distribution
mapping, and literature trends to extract key insights. Researchers scrutinize the effectiveness of
regression and classification methods, emphasizing traditional models' efficacy. Ethical
considerations in predictive policing are discussed, and the review concludes with future research
directions.
Umair Muneer et.al [6] conducts a systematic literature review (SLR) on spatio-temporal
crime hotspot detection and prediction, focusing on data mining, machine learning, and time series
analysis. The methodology includes quality assessment, SLR validation, and performance
measures analysis. The paper categorizes crime forecasting approaches, highlighting dataset
challenges and proposing future research directions. Emphasis is placed on the importance of high-
quality, spatio-temporally labeled crime datasets for robust predictive models.
Marc Queudot et.al [7] introduces a transformative solution to limited legal representation.
Focused on immigration and banking, the chatbots integrate legal data using NLP techniques such
as Bag-of-Words and TF-IDF. The bank employee chatbot employs grammatical analysis and
cosine similarity for intent classification. Rigorous testing with real interactions validates its
efficacy. The immigration chatbot, open-sourced for collaborative development, aims to empower
immigrants with precise legal information. Overall, it envisions a future where user-friendly
chatbots bridge access gaps, democratizing legal guidance and fostering inclusivity.
Juin-Hao Ho et.al [8] explores the implementation of a Legal AI Bot for Sustainable
Development in Legal Advisory Institutions. The authors employ a Multicriteria Decision-Making
(MCDM) model and an Analytical Network Process (ANP) to address complexities in adopting
legal AI bots. The methodology integrates DEMATEL, DANP, and M-VIKOR models, providing
a robust framework to analyze user behavior intricacies. The study emphasizes the significance of
legal AI bots, introducing the DDANPV model for sustainable development, while acknowledging
the need for further research to enhance accuracy and understanding of interrelationships among
factors.
3 METHODOLOGY
3.1 Data Collection
The process of obtaining and putting together relevant information that will be utilized to test,
validate, or train a machine learning model is known as data collection. The model's performance
and capacity for generalization are strongly influenced by the caliber and volume of data that were
gathered. The aim is to create or gather a representative set of data that matches the kinds of real-
world situations the model is likely to face. The complexity of legal language, the diversity of
legal documents, data authenticity and quality, and legal annotation and labeling are a few of the
numerous difficulties encountered when collecting legal data. To produce solid and trustworthy
datasets for legal research and model development, legal professionals, domain experts, and
machine learning experts must work together to address these problems.
4 IMPLEMENTATION
4.1 Data Collection
In order to gather data, LawBot Info consults important legal sources, including the Indian Penal
Code (IPC), the Criminal Penal Code (CrPC), the Civil Penal Code (CPC), and the Indian
Constitution (COI). The IPC.csv, CPC.csv, CrPC.csv, and COI.csv files are carefully selected
from a github repository [9] that contained the csv files of legal textbooks and documents to
include a broad spectrum of legal provisions and concepts. The Indian Legal Document Corpus, or
ILDC, was attained through the LawBot Predict data collection process. The Supreme Court of
India (SCI) case proceedings are included in ILDC, together with original court rulings that
provide context. The information is acquired from fellow researchers who worked on the ILDC for
CJPE: Indian Legal Documents Corpus for Court Judgment Prediction and Explanation [10]. To
guarantee accuracy, it goes through a rigorous validation procedure. The gathered data forms the
groundwork for training the LawBots, empowering them to offer knowledgeable and contextually
appropriate legal perspectives to users and formulate predictions based on the provided
information about a case.
LawBot's performance is greatly increased by a thorough data preprocessing step that uses
Python's JSON module to handle null values, combine section numbers and titles, and remove
chapter titles and numbers, among other duties. Tokenization, lowercase conversion, elimination
of stopwords, non-alphanumeric character removal, and lemmatization are all steps in the feature
extraction process. The supervised learning model performs better when preprocessed phrases are
converted into features, which is made possible by the TF-IDF vectorization technique. To ensure
best performance during deployment, a labeled dataset is used for model training, and the TF-IDF
vectorizer and matrix are stored for later usage. LawBot responds to test statements accurately,
however it is acknowledged that ongoing optimization is necessary. With the help of labeled
datasets and input, the successful implementation highlights the potential of supervised learning
and natural language processing in the access to legal information.
Using techniques like SVM, Decision Tree, and Random Forest for the analysis, we ran tests on a
dataset of 100 inputs. The confusion matrices listed below shows how the SVM, Decision Tree,
and Random Forest algorithms' performance is evaluated. It was clear from the performance
evaluation that the Random Forest algorithm performed better than the SVM and Decision Tree
algorithms in terms of accuracy. SVM performed better than the Decision Tree method, even
though it took longer to finish.
FIGURE 4 Confusion Matrices of SVM, Random Forest, Decision Tree
6 CONCLUSION
LawBot is a revolutionary tool when it comes to accessibility and comprehension in the complex
world of India's legal system. Its specific features - LawBot Info and LawBot Predict, converge to
offer immediate, cost-effective, and round-the-clock legal guidance. LawBot's relevance stems
from its capacity to dismantle obstacles to legal knowledge. By breaking down complicated legal
terminology, it gives people, companies, and marginalized communities more power and promotes
a culture in which legal knowledge is seen as a fundamental right rather than a privilege. The
ability of LawBot Predict to reduce delays and speed up decision-making processes is in line with
the urgent demand for quicker decisions inside the legal system. Furthermore,
LawBot's accessibility mitigates financial barriers by minimizing the need for frequent expert
consultations, possibly democratizing legal aid. LawBot is essentially a paradigm change that
bridges the gap between accessibility and legal knowledge. Its essential function in promoting
empowerment and well-informed decision-making creates the conditions for a society in which
legal understanding is available to all. This innovation reimagines legal aid, making it an essential
instrument that enables people to interact with the law with confidence and effectiveness. It also
breaks down barriers and promotes a fairer judicial system for all parties concerned. LawBot helps
people and communities move toward a more inclusive and enlightened legal system by fostering
a shared journey towards justice and legal understanding.
KEYWORDS
Artificial Intelligence (AI)
Machine Learning
Legal Knowledge
References
1. Dharwadkar, R., & Deshpande, N.. (2018). A Medical ChatBot. 60, 41–45.
https://doi.org/10.14445/22312803/IJCTT-V60P106
2. Baek, M.S., Park, W., Park, J., Jang, K.H., & Lee, Y.T. (2021). Smart Policing Technique With
Crime Type and Risk Score Prediction Based on Machine Learning for Early Awareness of Risk
Situation. PP, 1–1. https://doi.org/10.1109/ACCESS.2021.3112682
3. Mandalapu, V., Elluri, L., Vyas, P., & Roy, N.. (2023). Crime Prediction Using Machine
Learning and Deep Learning: A Systematic Review and Future Directions. PP, 1–1.
https://doi.org/10.1109/ACCESS.2023.3286344
4. Butt, U., Letchmunan, S., Hassan, F. H., Ali, M., Baqir, A., & Sherazi, H.. (2020). Spatio-
Temporal Crime HotSpot Detection and Prediction: A Systematic Literature Review. 8, 166553–
166574. https://doi.org/10.1109/ACCESS.2020.3022808
5. Nagamallika, D., Vandana, P.G., Dakshayani, P., Manikanta, R.A., & Kumar, K.K. (2021).
CRIMINAL IDENTIFICATION SYSTEM USING DEEP LEARNING.
6. Cui, J., Shen, X., Nie, F., Wang, Z., Wang, J., & Chen, Y. (2022). A Survey on Legal Judgment
Prediction: Datasets, Metrics, Models and Challenges.
7. Ho, J.H., Lee, G.G., & Lu, M.T.. (2020). Exploring the Implementation of a Legal AI Bot for
Sustainable Development in Legal Advisory Institutions. 12, 5991.
https://doi.org/10.3390/su12155991
8. Queudot, M., Charton, É., & Meurs, M.J.. (2020). Improving Access to Justice with Legal
Chatbots. 3, 356–375. https://doi.org/10.3390/stats3030023
9. https://github.com/civictech-India/Indian-Law-Penal-Code-Json/tree/main
10. Malik, V., Sanjay, R., Kumar Nigam, S., Ghosh, K., Guha, S., Bhattacharya, A., & Modi, A..
(2021). ILDC for CJPE: Indian Legal Documents Corpus for Court Judgment Prediction and
Explanation. https://doi.org/10.18653/v1/2021.acl-long.313