0% found this document useful (0 votes)
10 views

PDF File Extraction

Uploaded by

mahesh Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

PDF File Extraction

Uploaded by

mahesh Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Problem statement

1.Summerzation
2.Table Extraction
3. Extract in the Exact Value
Objective
• Summarization: Develop an automated text summarization system to
generate concise and coherent summaries from lengthy documents.
• Table Extraction: Create an efficient algorithm to identify and extract
tables from various document formats.
• Exact Value Extraction: Implement a method to accurately extract
numerical data and relevant information from the identified tables.
Summarizing PDF

1.Hugging Face Transformers: Utilize the transformers library from Hugging Face, which provides access to powerful pre-trained models.
2.BART Model: Use the BART model (facebook/bart-base) for summarization tasks, leveraging its auto-regressive nature.

3.Extract Text: Extract text content from the first page of the PDF using pdfplumber.
4.Tokenization: Tokenize the text using the BART tokenizer to prepare it for the model input.
5.Summarization: Use BART model to generate a summary of the text and decode the summary for final output.
Extracting Tables from PDF

1.Use pdfplumber library: Utilize pdfplumber for reliable and accurate extraction of tables from the PDF document.
2.Page-by-page processing: Process each page of the PDF individually to extract text content and tables.

3.Table extraction: Employ pdfplumber extract_tables() 's method to extract tables from each page.
4.Storing results: Store the extracted tables for each page in a list, creating a list of tables for all pages.
Generating Python Code from PDF
• Process Each Line: Iterate through each line of text extracted from the
PDF.
• Custom Function: Create a function to generate Python code that
prints the text.
• Skip Empty Lines: Exclude empty lines from the output code to avoid
unnecessary print statements.
• Customized Information: Modify code generation to fit specific
requirements, such as filtering by keywords or formatting.
• Output Code: Store the generated Python code in a variable for
further use or presentation.

You might also like