org.cyberneko.html.filters

Class ElementRemover

Implemented Interfaces:
XMLComponent, XMLDocumentFilter, HTMLComponent

public class ElementRemover
extends DefaultFilter

This class is a document filter capable of removing specified elements from the processing stream. There are two options for processing document elements:

The first option allows the application to specify which elements appearing in the event stream should be accepted and, therefore, passed on to the next stage in the pipeline. All elements not in the list of acceptable elements have their start and end tags stripped from the event stream unless those elements appear in the list of elements to be removed.

The second option allows the application to specify which elements should be completely removed from the event stream. When an element appears that is to be removed, the element's start and end tag as well as all of that element's content is removed from the event stream.

A common use of this filter would be to only allow rich-text and linking elements as well as the character content to pass through the filter — all other elements would be stripped. The following code shows how to configure this filter to perform this task:

  ElementRemover remover = new ElementRemover();
  remover.acceptElement("b", null);
  remover.acceptElement("i", null);
  remover.acceptElement("u", null);
  remover.acceptElement("a", new String[] { "href" });
 

However, this would still allow the text content of other elements to pass through, which may not be desirable. In order to further "clean" the input, the removeElement option can be used. The following piece of code adds the ability to completely remove any <SCRIPT> tags and content from the stream.

  remover.removeElement("script");
 

Note: All text and accepted element children of a stripped element is retained. To completely remove an element's content, use the removeElement method.

Note: Care should be taken when using this filter because the output may not be a well-balanced tree. Specifically, if the application removes the <HTML> element (with or without retaining its children), the resulting document event stream will no longer be well-formed.

Version:
$Id: ElementRemover.java,v 1.5 2005/02/14 03:56:54 andyc Exp $
Author:
Andy Clark

Field Summary

protected static Object
NULL
A "null" object.
protected Hashtable
fAcceptedElements
Accepted elements.
protected int
fElementDepth
The element depth.
protected int
fRemovalElementDepth
The element depth at element removal.
protected Hashtable
fRemovedElements
Removed elements.

Fields inherited from class org.cyberneko.html.filters.DefaultFilter

fDocumentHandler, fDocumentSource

Method Summary

void
acceptElement(String element, String[] attributes)
Specifies that the given element should be accepted and, optionally, which attributes of that element should be kept.
void
characters(XMLString text, Augmentations augs)
Characters.
void
comment(XMLString text, Augmentations augs)
Comment.
protected boolean
elementAccepted(String element)
Returns true if the specified element is accepted.
protected boolean
elementRemoved(String element)
Returns true if the specified element should be removed.
void
emptyElement(QName element, XMLAttributes attributes, Augmentations augs)
Empty element.
void
endCDATA(Augmentations augs)
End CDATA section.
void
endElement(QName element, Augmentations augs)
End element.
void
endGeneralEntity(String name, Augmentations augs)
End general entity.
void
endPrefixMapping(String prefix, Augmentations augs)
End prefix mapping.
protected boolean
handleOpenTag(QName element, XMLAttributes attributes)
Handles an open tag.
void
ignorableWhitespace(XMLString text, Augmentations augs)
Ignorable whitespace.
void
processingInstruction(String target, XMLString data, Augmentations augs)
Processing instruction.
void
removeElement(String element)
Specifies that the given element should be completely removed.
void
startCDATA(Augmentations augs)
Start CDATA section.
void
startDocument(XMLLocator locator, String encoding, Augmentations augs)
Start document.
void
startDocument(XMLLocator locator, String encoding, NamespaceContext nscontext, Augmentations augs)
Start document.
void
startElement(QName element, XMLAttributes attributes, Augmentations augs)
Start element.
void
startGeneralEntity(String name, XMLResourceIdentifier id, String encoding, Augmentations augs)
Start general entity.
void
startPrefixMapping(String prefix, String uri, Augmentations augs)
Start prefix mapping.
void
textDecl(String version, String encoding, Augmentations augs)
Text declaration.

Methods inherited from class org.cyberneko.html.filters.DefaultFilter

characters, comment, doctypeDecl, emptyElement, endCDATA, endDocument, endElement, endGeneralEntity, endPrefixMapping, getDocumentHandler, getDocumentSource, getFeatureDefault, getPropertyDefault, getRecognizedFeatures, getRecognizedProperties, ignorableWhitespace, merge, processingInstruction, reset, setDocumentHandler, setDocumentSource, setFeature, setProperty, startCDATA, startDocument, startDocument, startElement, startGeneralEntity, startPrefixMapping, textDecl, xmlDecl

Field Details

NULL

protected static final Object NULL
A "null" object.

fAcceptedElements

protected Hashtable fAcceptedElements
Accepted elements.

fElementDepth

protected int fElementDepth
The element depth.

fRemovalElementDepth

protected int fRemovalElementDepth
The element depth at element removal.

fRemovedElements

protected Hashtable fRemovedElements
Removed elements.

Method Details

acceptElement

public void acceptElement(String element,
                          String[] attributes)
Specifies that the given element should be accepted and, optionally, which attributes of that element should be kept.
Parameters:
element - The element to accept.
attributes - The list of attributes to be kept or null if no attributes should be kept for this element. see #removeElement

characters

public void characters(XMLString text,
                       Augmentations augs)
            throws XNIException
Characters.
Overrides:
characters in interface DefaultFilter

comment

public void comment(XMLString text,
                    Augmentations augs)
            throws XNIException
Comment.
Overrides:
comment in interface DefaultFilter

elementAccepted

protected boolean elementAccepted(String element)
Returns true if the specified element is accepted.

elementRemoved

protected boolean elementRemoved(String element)
Returns true if the specified element should be removed.

emptyElement

public void emptyElement(QName element,
                         XMLAttributes attributes,
                         Augmentations augs)
            throws XNIException
Empty element.
Overrides:
emptyElement in interface DefaultFilter

endCDATA

public void endCDATA(Augmentations augs)
            throws XNIException
End CDATA section.
Overrides:
endCDATA in interface DefaultFilter

endElement

public void endElement(QName element,
                       Augmentations augs)
            throws XNIException
End element.
Overrides:
endElement in interface DefaultFilter

endGeneralEntity

public void endGeneralEntity(String name,
                             Augmentations augs)
            throws XNIException
End general entity.
Overrides:
endGeneralEntity in interface DefaultFilter

endPrefixMapping

public void endPrefixMapping(String prefix,
                             Augmentations augs)
            throws XNIException
End prefix mapping.
Overrides:
endPrefixMapping in interface DefaultFilter

handleOpenTag

protected boolean handleOpenTag(QName element,
                                XMLAttributes attributes)
Handles an open tag.

ignorableWhitespace

public void ignorableWhitespace(XMLString text,
                                Augmentations augs)
            throws XNIException
Ignorable whitespace.
Overrides:
ignorableWhitespace in interface DefaultFilter

processingInstruction

public void processingInstruction(String target,
                                  XMLString data,
                                  Augmentations augs)
            throws XNIException
Processing instruction.
Overrides:
processingInstruction in interface DefaultFilter

removeElement

public void removeElement(String element)
Specifies that the given element should be completely removed. If an element is encountered during processing that is on the remove list, the element's start and end tags as well as all of content contained within the element will be removed from the processing stream.
Parameters:
element - The element to completely remove.

startCDATA

public void startCDATA(Augmentations augs)
            throws XNIException
Start CDATA section.
Overrides:
startCDATA in interface DefaultFilter

startDocument

public void startDocument(XMLLocator locator,
                          String encoding,
                          Augmentations augs)
            throws XNIException
Start document.
Overrides:
startDocument in interface DefaultFilter

startDocument

public void startDocument(XMLLocator locator,
                          String encoding,
                          NamespaceContext nscontext,
                          Augmentations augs)
            throws XNIException
Start document.
Overrides:
startDocument in interface DefaultFilter

startElement

public void startElement(QName element,
                         XMLAttributes attributes,
                         Augmentations augs)
            throws XNIException
Start element.
Overrides:
startElement in interface DefaultFilter

startGeneralEntity

public void startGeneralEntity(String name,
                               XMLResourceIdentifier id,
                               String encoding,
                               Augmentations augs)
            throws XNIException
Start general entity.
Overrides:
startGeneralEntity in interface DefaultFilter

startPrefixMapping

public void startPrefixMapping(String prefix,
                               String uri,
                               Augmentations augs)
            throws XNIException
Start prefix mapping.
Overrides:
startPrefixMapping in interface DefaultFilter

textDecl

public void textDecl(String version,
                     String encoding,
                     Augmentations augs)
            throws XNIException
Text declaration.
Overrides:
textDecl in interface DefaultFilter

(C) Copyright 2002-2005, Andy Clark. All rights reserved.