com.norconex.collector.http.handler.impl
Class DefaultURLExtractor
java.lang.Object
com.norconex.collector.http.handler.impl.DefaultURLExtractor
- All Implemented Interfaces:
- IURLExtractor, IXMLConfigurable, Serializable
public class DefaultURLExtractor
- extends Object
- implements IURLExtractor, IXMLConfigurable
Default implementation of IURLExtractor
.
XML configuration usage (not required since default):
<urlExtractor class="com.norconex.collector.http.handler.DefaultURLExtractor">
<maxURLLength>
(Optional maximum URL length. Longer URLs won't be extracted.
Default is 2048.)
</maxURLLength>
</urlExtractor>
- Author:
- Pascal Essiembre
- See Also:
- Serialized Form
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DEFAULT_MAX_URL_LENGTH
public static final int DEFAULT_MAX_URL_LENGTH
- See Also:
- Constant Field Values
DefaultURLExtractor
public DefaultURLExtractor()
extractURLs
public Set<String> extractURLs(Reader document,
String documentUrl,
ContentType contentType)
throws IOException
- Description copied from interface:
IURLExtractor
- Extracts URLs out of a document.
- Specified by:
extractURLs
in interface IURLExtractor
- Parameters:
document
- the documentdocumentUrl
- document urlcontentType
- the document content type
- Returns:
- a set of URLs
- Throws:
IOException
- problem reading the document
getMaxURLLength
public int getMaxURLLength()
setMaxURLLength
public void setMaxURLLength(int maxURLLength)
loadFromXML
public void loadFromXML(Reader in)
- Specified by:
loadFromXML
in interface IXMLConfigurable
saveToXML
public void saveToXML(Writer out)
throws IOException
- Specified by:
saveToXML
in interface IXMLConfigurable
- Throws:
IOException
Copyright © 2009-2013 Norconex Inc.. All Rights Reserved.