com.lowagie.text.pdf.parser

Class PdfTextExtractor

public class PdfTextExtractor extends Object

Extracts text from a PDF file.

Since: 2.1.4

Field Summary
SimpleTextExtractingPdfContentStreamProcessorextractionProcessor
The processor that will extract the text.
PdfReaderreader
The PdfReader that holds the PDF file.
Constructor Summary
PdfTextExtractor(PdfReader reader)
Creates a new Text Extractor object.
Method Summary
byte[]getContentBytesForPage(int pageNum)
Gets the content stream of a page.
StringgetTextFromPage(int page)
Gets the text from a page.

Field Detail

extractionProcessor

private final SimpleTextExtractingPdfContentStreamProcessor extractionProcessor
The processor that will extract the text.

reader

private final PdfReader reader
The PdfReader that holds the PDF file.

Constructor Detail

PdfTextExtractor

public PdfTextExtractor(PdfReader reader)
Creates a new Text Extractor object.

Parameters: reader the reader with the PDF

Method Detail

getContentBytesForPage

private byte[] getContentBytesForPage(int pageNum)
Gets the content stream of a page.

Parameters: pageNum the page number of page you want get the content stream from

Returns: a byte array with the content stream of a page

Throws: IOException

getTextFromPage

public String getTextFromPage(int page)
Gets the text from a page.

Parameters: page the page number of the page

Returns: a String with the content as plain text (without PDF syntax)

Throws: IOException