A B C D E F G H I J K L M N O P R S T U V W Y 

A

abbreviate(String) - Method in class com.manning.hip.common.PaddedTable
 
add(Delta) - Method in class org.hedera.io.RevisionDiff
Deprecated.
 
add(long, long, String[]) - Method in class org.hedera.io.RevisionSplits
 
addColumnTitle(String) - Method in class com.manning.hip.common.PaddedTable
 
addColumnValue(String) - Method in class com.manning.hip.common.PaddedTable
 
addColumnValue(int) - Method in class com.manning.hip.common.PaddedTable
 
addColumnValue(long) - Method in class com.manning.hip.common.PaddedTable
 
addColumnValue(double) - Method in class com.manning.hip.common.PaddedTable
 
addColumnValueNoAbbreviate(String) - Method in class com.manning.hip.common.PaddedTable
 
addDirToCache(File, FileSystem, Set<String>) - Static method in class com.manning.hip.common.JobHelper
 
addJarForJob(Configuration, String) - Static method in class com.manning.hip.common.JobHelper
 
addLink(LinkProfile.Link) - Method in class org.hedera.io.LinkProfile
 
addPatch(String) - Method in class org.hedera.io.RevisionConcatText
 
addToCache(String, FileSystem, Set<String>) - Static method in class com.manning.hip.common.JobHelper
 
addToCache(Configuration) - Static method in class com.manning.hip.common.JobHelper
 
AnnotatingMarkupParser - Class in pignlproc.markup
Parse mediawiki markup to strip the formatting info and extract a simple text version suitable for NLP along with header, paragraph and link position annotations.
AnnotatingMarkupParser() - Constructor for class pignlproc.markup.AnnotatingMarkupParser
 
AnnotatingMarkupParser(String) - Constructor for class pignlproc.markup.AnnotatingMarkupParser
 
AnnotatingMarkupParser.CountingAppendable - Class in pignlproc.markup
 
AnnotatingMarkupParser.CountingAppendable(Appendable) - Constructor for class pignlproc.markup.AnnotatingMarkupParser.CountingAppendable
 
Annotation - Class in pignlproc.markup
 
Annotation(int, int, String, String) - Constructor for class pignlproc.markup.Annotation
 
ApacheCommonLogParser - Class in com.manning.hip.common
A modified form of CSVParser which handles the Apache Log file format.
ApacheCommonLogParser() - Constructor for class com.manning.hip.common.ApacheCommonLogParser
Constructs CSVParser using a comma for the separator.
ApacheCommonLogParser(char[]) - Constructor for class com.manning.hip.common.ApacheCommonLogParser
Constructs CSVParser with supplied separator.
ApacheCommonLogParser(char[], char) - Constructor for class com.manning.hip.common.ApacheCommonLogParser
Constructs CSVParser with supplied separator and quote char.
ApacheCommonLogParser(char[], char, char) - Constructor for class com.manning.hip.common.ApacheCommonLogParser
Constructs CSVReader with supplied separator and quote char.
ApacheCommonLogParser(char[], char, char, boolean) - Constructor for class com.manning.hip.common.ApacheCommonLogParser
Constructs CSVReader with supplied separator and quote char.
ApacheCommonLogParser(char[], char, char, boolean, boolean) - Constructor for class com.manning.hip.common.ApacheCommonLogParser
Constructs CSVReader with supplied separator and quote char.
ApacheCommonLogReader - Class in com.manning.hip.common
 
ApacheCommonLogReader() - Constructor for class com.manning.hip.common.ApacheCommonLogReader
 
append(CharSequence) - Method in class pignlproc.markup.AnnotatingMarkupParser.CountingAppendable
 
append(char) - Method in class pignlproc.markup.AnnotatingMarkupParser.CountingAppendable
 
append(CharSequence, int, int) - Method in class pignlproc.markup.AnnotatingMarkupParser.CountingAppendable
 
appendToBuilder(StringBuilder, int, String) - Method in class com.manning.hip.common.PaddedTable
 

B

bags - Variable in class org.hedera.pig.load.ClueWeb09WarcLoader
 
bags - Variable in class org.hedera.pig.load.LiteWikipediaLoader
 
bags - Variable in class org.hedera.pig.load.TimeseriesLoader
 
bags - Variable in class org.hedera.pig.load.WikiPageLoadTest
 
bags - Variable in class org.hedera.pig.load.WikiRevisionFullTextFilter
 
bags - Variable in class org.hedera.pig.load.WikiRevisionLoader
 
bags - Variable in class org.hedera.pig.load.WikiRevisionLoaderTest
 
bags - Variable in class org.hedera.pig.load.WikiRevisionPairLoader
 
BasicComputeTermStats - Class in org.hedera.mapreduce
Compute the term statistics from Wikipedia Revision.
BasicComputeTermStats() - Constructor for class org.hedera.mapreduce.BasicComputeTermStats
 
begin - Variable in class pignlproc.markup.Annotation
 
buf - Variable in class org.hedera.io.input.WikiRevisionReader
 
buildBOW(List<String>) - Method in class org.hedera.io.RevisionBOW
build the bag-of-word model here.
BuildDictionary - Class in org.hedera.mapreduce
this code re-uses the clueweb-tools in building the dictionary for Wikipedia Revision articles (skip non-article pages).
BuildDictionary() - Constructor for class org.hedera.mapreduce.BuildDictionary
 
BuildDictionary.Terms - Enum in org.hedera.mapreduce
 
BuildPForDocVectors - Class in org.hedera.mapreduce
 
BuildPForDocVectors() - Constructor for class org.hedera.mapreduce.BuildPForDocVectors
 
BuildVByteDocVectors - Class in org.hedera.mapreduce
 
BuildVByteDocVectors() - Constructor for class org.hedera.mapreduce.BuildVByteDocVectors
 
byte2opt(byte) - Static method in class org.hedera.io.RevisionDiff
Deprecated.
 
ByteMatcher - Class in org.hedera.util
 
ByteMatcher(InputStream, Seekable) - Constructor for class org.hedera.util.ByteMatcher
 
ByteMatcher(SeekableInputStream) - Constructor for class org.hedera.util.ByteMatcher
 

C

CatHdfs - Class in com.manning.hip.common
 
CatHdfs() - Constructor for class com.manning.hip.common.CatHdfs
 
CF_BY_ID_DATA - Static variable in class org.hedera.mapreduce.BuildDictionary
 
CF_BY_TERM_DATA - Static variable in class org.hedera.mapreduce.BuildDictionary
 
check(META, META) - Method in interface org.hedera.io.etl.ETLExtractor
compare two revisions based on their meta-data
check(RevisionHeader, RevisionHeader) - Method in class org.hedera.io.etl.RevisionBOWInputFormat.RevisionBOWExtractor
 
check(RevisionHeader, RevisionHeader) - Method in class org.hedera.io.etl.RevisionConcatInputFormat.RevisionConcatTextExtractor
 
check(RevisionHeader, RevisionHeader) - Method in class org.hedera.io.etl.RevisionIdsFormat.IdExtractor
 
check(RevisionHeader, RevisionHeader) - Method in class org.hedera.io.etl.RevisionLinkInputFormat.LinkExtractor
 
clear() - Method in class org.hedera.io.LinkProfile
 
clear() - Method in class org.hedera.io.Revision
 
clear() - Method in class org.hedera.io.RevisionBOW
 
clear() - Method in class org.hedera.io.RevisionConcatText
 
clear() - Method in class org.hedera.io.RevisionDiff
Deprecated.
 
clear() - Method in class org.hedera.io.RevisionHeader
 
clear() - Method in class org.hedera.io.RevisionSplits
 
clearRevisions() - Method in class org.hedera.io.etl.RevisionBOWInputFormat.RevisionBOWReader
 
clearRevisions() - Method in class org.hedera.io.etl.RevisionETLReader
 
clearRows() - Method in class com.manning.hip.common.PaddedTable
 
clone(T) - Method in interface org.hedera.io.CloneableObject
override the fields of this object with values from the source
clone(RevisionHeader) - Method in class org.hedera.io.RevisionHeader
 
CloneableObject<T> - Interface in org.hedera.io
A typed version of Java Cloneable interface
close() - Method in class com.manning.hip.common.CommonLogInputFormat.CommonLogRecordReader
 
close() - Method in class org.hedera.io.etl.RevisionETLReader
 
close() - Method in class org.hedera.io.input.FileNullInputFormat.FileNullRecordReader
 
close() - Method in class org.hedera.io.input.WikiFullRevisionJsonInputFormat.JsonRevisionReader
 
close() - Method in class org.hedera.io.input.WikiRevisionReader
 
ClueWeb09WarcLoader - Class in org.hedera.pig.load
 
ClueWeb09WarcLoader() - Constructor for class org.hedera.pig.load.ClueWeb09WarcLoader
 
com.manning.hip.common - package com.manning.hip.common
 
com.twitter.elephantbird.util - package com.twitter.elephantbird.util
 
CommonLogEntry - Class in com.manning.hip.common
 
CommonLogEntry() - Constructor for class com.manning.hip.common.CommonLogEntry
 
CommonLogInputFormat - Class in com.manning.hip.common
Assumes one line per log entry object
CommonLogInputFormat() - Constructor for class com.manning.hip.common.CommonLogInputFormat
 
CommonLogInputFormat.CommonLogRecordReader - Class in com.manning.hip.common
 
CommonLogInputFormat.CommonLogRecordReader() - Constructor for class com.manning.hip.common.CommonLogInputFormat.CommonLogRecordReader
 
compareTo(SplitUnitOld) - Method in class org.hedera.io.SplitUnitOld
Deprecated.
 
compressed - Variable in class org.hedera.io.input.WikiRevisionReader
 
compressionCodecs - Variable in class org.hedera.io.input.WikiFullRevisionJsonInputFormat
 
compressionCodecs - Variable in class org.hedera.io.input.WikiRevisionInputFormat
 
configure(Configuration) - Method in class org.hedera.io.input.WikiFullRevisionJsonInputFormat
 
configure(Configuration) - Method in class org.hedera.io.input.WikiRevisionInputFormat
 
contains(T) - Method in class com.manning.hip.common.Range
 
convert(String, boolean) - Static method in class org.hedera.io.LinkProfile.Link
 
COUNT_OPTION - Static variable in class org.hedera.mapreduce.BuildDictionary
 
createRecordReader(InputSplit, TaskAttemptContext) - Method in class com.manning.hip.common.CommonLogInputFormat
 
createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.etl.RevisionBOWInputFormat
 
createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.etl.RevisionConcatInputFormat
 
createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.etl.RevisionIdsFormat
 
createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.etl.RevisionLinkInputFormat
 
createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.FileNullInputFormat
 
createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiFullRevisionJsonInputFormat
 
createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiRevisionDiffInputFormat
 
createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiRevisionFullInputFormat
 
createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiRevisionHeaderInputFormat
 
createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiRevisionInputFormat
 
createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiRevisionPageInputFormat
 
createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiRevisionPairInputFormat
 
createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiRevisionTextInputFormat
 
createRecordReader(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiRevisionTimeInputFormat
 
currentPosition - Variable in class pignlproc.markup.AnnotatingMarkupParser.CountingAppendable
 

D

DAY_SCALE_OPT - Static variable in class org.hedera.io.etl.IntervalRevisionETLReader
 
decodeLine(Text) - Method in class com.manning.hip.common.ApacheCommonLogReader
 
decodeLine(String) - Method in class com.manning.hip.common.ApacheCommonLogReader
 
decodeLineToJson(JsonParser, Text, FullRevision) - Method in class org.hedera.io.input.WikiFullRevisionJsonInputFormat.JsonRevisionReader
 
DEFAULT_ESCAPE_CHARACTER - Static variable in class com.manning.hip.common.ApacheCommonLogParser
The default escape character to use if none is supplied to the constructor.
DEFAULT_IGNORE_LEADING_WHITESPACE - Static variable in class com.manning.hip.common.ApacheCommonLogParser
The default leading whitespace behavior to use if none is supplied to the constructor
DEFAULT_MAX_BLOCK_SIZE - Static variable in class org.hedera.io.etl.RevisionETLReader
 
DEFAULT_MAX_BLOCK_SIZE - Static variable in class org.hedera.io.input.WikiFullRevisionJsonInputFormat.JsonRevisionReader
 
DEFAULT_MAX_BLOCK_SIZE - Static variable in class org.hedera.io.input.WikiRevisionReader
 
DEFAULT_QUOTE_CHARACTER - Static variable in class com.manning.hip.common.ApacheCommonLogParser
The default quote character to use if none is supplied to the constructor.
DEFAULT_SEPARATOR - Static variable in class com.manning.hip.common.ApacheCommonLogParser
The default separators to use if none is supplied to the constructor.
DEFAULT_STRICT_QUOTES - Static variable in class com.manning.hip.common.ApacheCommonLogParser
The default strict quote behavior to use if none is supplied to the constructor
DefaultRevisionETLReader<KEYIN,VALUEIN> - Class in org.hedera.io.etl
A default WikiRevisionETLReader that extracts title, page id, namespace from the page header
DefaultRevisionETLReader() - Constructor for class org.hedera.io.etl.DefaultRevisionETLReader
 
defineSchema() - Method in class org.hedera.pig.load.ClueWeb09WarcLoader
 
defineSchema() - Method in class org.hedera.pig.load.FileNameLoader
 
defineSchema() - Method in class org.hedera.pig.load.LiteWikipediaLoader
 
defineSchema() - Method in class org.hedera.pig.load.TimeseriesLoader
 
defineSchema() - Method in class org.hedera.pig.load.WikiPageLoadTest
 
defineSchema() - Method in class org.hedera.pig.load.WikiRevisionLoader
 
defineSchema() - Method in class org.hedera.pig.load.WikiRevisionLoaderTest
 
defineSchema() - Method in class org.hedera.pig.load.WikiRevisionPairLoader
 
DF_BY_ID_DATA - Static variable in class org.hedera.mapreduce.BuildDictionary
 
DF_BY_TERM_DATA - Static variable in class org.hedera.mapreduce.BuildDictionary
 
DF_MIN_OPTION - Static variable in class org.hedera.mapreduce.BasicComputeTermStats
 
DICTIONARY_OPTION - Static variable in class org.hedera.mapreduce.BuildPForDocVectors
 
DICTIONARY_OPTION - Static variable in class org.hedera.mapreduce.BuildVByteDocVectors
 
DICTIONARY_OPTION - Static variable in class org.hedera.util.VectorizeAnchorMap
 
doWhenMatch() - Method in class org.hedera.io.input.WikiRevisionDiffInputFormat.DiffReader
 
doWhenMatch() - Method in class org.hedera.io.input.WikiRevisionFullInputFormat.RevisionReader
 
doWhenMatch() - Method in class org.hedera.io.input.WikiRevisionHeaderInputFormat.RevisionReader
 
doWhenMatch() - Method in class org.hedera.io.input.WikiRevisionPageInputFormat.RevisionReader
 
doWhenMatch() - Method in class org.hedera.io.input.WikiRevisionPairInputFormat.RevisionReader
 
doWhenMatch() - Method in class org.hedera.io.input.WikiRevisionReader
What to do when encountering one relevant tag
doWhenMatch() - Method in class org.hedera.io.input.WikiRevisionTextInputFormat.RevisionReader
 
doWhenMatch() - Method in class org.hedera.io.input.WikiRevisionTimeInputFormat.RevisionReader
 

E

end - Variable in class org.hedera.io.input.WikiRevisionReader
 
end - Variable in class pignlproc.markup.Annotation
 
END_COMMENT - Static variable in class org.hedera.io.input.WikiRevisionFullInputFormat
 
END_CONTRIBUTOR - Static variable in class org.hedera.io.input.WikiRevisionFullInputFormat
 
END_ID - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
END_NAMESPACE - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
END_PAGE - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
END_PAGE_TAG - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
END_PARENT_ID - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
END_REDIRECT - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
END_REVISION - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
END_TEXT - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
END_TIME_OPT - Static variable in class org.hedera.io.etl.IntervalRevisionETLReader
 
END_TIMESTAMP - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
END_TIMESTAMP_TAG - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
END_TITLE - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
equals(Object) - Method in class org.hedera.io.RevisionHeader
 
equalsName(String) - Method in enum org.hedera.io.input.WikiRevisionTimeInputFormat.TimeScale
 
ETLExtractor<KEY,VALUE,META> - Interface in org.hedera.io.etl
API to provide algorithms for extracting information right in readers
exec(Tuple) - Method in class org.hedera.pig.eval.OneDayMore
 
exec(Tuple) - Method in class org.hedera.pig.eval.PageFunc
 
exec(Tuple) - Method in class org.hedera.pig.eval.UnixToElasticTime
 
exec(Tuple) - Method in class org.hedera.pig.eval.UnixToYYYYMMdd
 
exec(Tuple) - Method in class org.hedera.pig.eval.YYYYMMddHHToYYYYMMdd
 
extract(DataOutputBuffer, META, KEY, VALUE) - Method in interface org.hedera.io.etl.ETLExtractor
extract the revision content and populate the OUTPUT .
extract(DataOutputBuffer, RevisionHeader, LongWritable, RevisionBOW) - Method in class org.hedera.io.etl.RevisionBOWInputFormat.RevisionBOWExtractor
 
extract(DataOutputBuffer, RevisionHeader, LongWritable, RevisionConcatText) - Method in class org.hedera.io.etl.RevisionConcatInputFormat.RevisionConcatTextExtractor
 
extract(DataOutputBuffer, RevisionHeader, LongWritable, PairOfLongs) - Method in class org.hedera.io.etl.RevisionIdsFormat.IdExtractor
 
extract(DataOutputBuffer, RevisionHeader, LongWritable, LinkProfile) - Method in class org.hedera.io.etl.RevisionLinkInputFormat.LinkExtractor
 
extractor - Variable in class org.hedera.io.etl.RevisionETLReader
 
extractParagraph(byte[], int, int) - Static method in class org.hedera.io.input.WikiRevisionDiffInputFormat.DiffReader
 
ExtractRevisionIds - Class in org.hedera.mapreduce
 
ExtractRevisionIds() - Constructor for class org.hedera.mapreduce.ExtractRevisionIds
 
ExtractTemplate - Class in org.hedera.pig.eval.wikipedia
Extract a list of templates from Wikipedia raw page and have it returned as a data bag
ExtractTemplate() - Constructor for class org.hedera.pig.eval.wikipedia.ExtractTemplate
 
ExtractTemporalAnchorText - Class in org.hedera.mapreduce
This jobs extract temporal anchor text from Wikipedia revisions
ExtractTemporalAnchorText() - Constructor for class org.hedera.mapreduce.ExtractTemporalAnchorText
 
ExtractTemporalAnchorText.Link - Class in org.hedera.mapreduce
 

F

FastExtractTemporalAnchorText - Class in org.hedera.mapreduce
 
FastExtractTemporalAnchorText() - Constructor for class org.hedera.mapreduce.FastExtractTemporalAnchorText
 
fetchMore() - Method in class org.hedera.io.etl.RevisionETLReader
Read the stream and update the internal buffer if necessary.
FileNameLoader - Class in org.hedera.pig.load
A simple UDF loader that reads a file and returns its path
FileNameLoader() - Constructor for class org.hedera.pig.load.FileNameLoader
 
FileNullInputFormat - Class in org.hedera.io.input
A virtual input format that checks one file and returns its name to the mapper.
FileNullInputFormat() - Constructor for class org.hedera.io.input.FileNullInputFormat
 
FileNullInputFormat.FileNullRecordReader - Class in org.hedera.io.input
 
FileNullInputFormat.FileNullRecordReader() - Constructor for class org.hedera.io.input.FileNullInputFormat.FileNullRecordReader
 
fileSplit() - Method in class org.hedera.io.SplitUnitOld
Deprecated.
Get the FileSplit from this unit
flag - Variable in class org.hedera.io.input.WikiRevisionReader
 
freeKey(LongWritable) - Method in class org.hedera.io.etl.RevisionBOWInputFormat.RevisionBOWReader
 
freeKey(LongWritable) - Method in class org.hedera.io.etl.RevisionConcatInputFormat.IntervalRevisionConcatTextReader
 
freeKey(KEYIN) - Method in class org.hedera.io.etl.RevisionETLReader
 
freeKey(LongWritable) - Method in class org.hedera.io.etl.RevisionIdsFormat.RevisionIdsReader
 
freeKey(LongWritable) - Method in class org.hedera.io.etl.RevisionLinkInputFormat.RevisionLinkReader
 
freeValue(RevisionBOW) - Method in class org.hedera.io.etl.RevisionBOWInputFormat.RevisionBOWReader
 
freeValue(RevisionConcatText) - Method in class org.hedera.io.etl.RevisionConcatInputFormat.IntervalRevisionConcatTextReader
 
freeValue(VALUEIN) - Method in class org.hedera.io.etl.RevisionETLReader
 
freeValue(PairOfLongs) - Method in class org.hedera.io.etl.RevisionIdsFormat.RevisionIdsReader
 
freeValue(LinkProfile) - Method in class org.hedera.io.etl.RevisionLinkInputFormat.RevisionLinkReader
 
fsin - Variable in class org.hedera.io.input.WikiRevisionReader
 
FullRevision - Class in org.hedera.io
A full revision that, besides Revision header and text, stores also user and comment info
FullRevision() - Constructor for class org.hedera.io.FullRevision
 

G

getAnchorText() - Method in class org.hedera.io.LinkProfile.Link
 
getAnchorText() - Method in class org.hedera.mapreduce.ExtractTemporalAnchorText.Link
 
getArgToFuncMapping() - Method in class org.hedera.pig.eval.OneDayMore
 
getArgToFuncMapping() - Method in class org.hedera.pig.eval.UnixToElasticTime
 
getArgToFuncMapping() - Method in class org.hedera.pig.eval.UnixToYYYYMMdd
 
getArgToFuncMapping() - Method in class org.hedera.pig.eval.YYYYMMddHHToYYYYMMdd
 
getAsLong(String) - Static method in class com.manning.hip.common.ApacheCommonLogReader
 
getAsString(String) - Static method in class com.manning.hip.common.ApacheCommonLogReader
 
getComment() - Method in class org.hedera.io.FullRevision
 
getConfiguration(JobContext) - Static method in class com.manning.hip.common.HadoopCompat
Invoke getConfiguration() on JobContext.
getContent(String) - Method in class org.hedera.util.MediaWikiProcessor
Extract the content from the mark-up text.
getCounter(TaskInputOutputContext, String, String) - Static method in class com.manning.hip.common.HadoopCompat
Invoke getCounter() on TaskInputOutputContext.
getCurrentKey() - Method in class com.manning.hip.common.CommonLogInputFormat.CommonLogRecordReader
 
getCurrentKey() - Method in class org.hedera.io.etl.RevisionETLReader
 
getCurrentKey() - Method in class org.hedera.io.input.FileNullInputFormat.FileNullRecordReader
 
getCurrentKey() - Method in class org.hedera.io.input.WikiFullRevisionJsonInputFormat.JsonRevisionReader
 
getCurrentKey() - Method in class org.hedera.io.input.WikiRevisionReader
 
getCurrentValue() - Method in class com.manning.hip.common.CommonLogInputFormat.CommonLogRecordReader
 
getCurrentValue() - Method in class org.hedera.io.etl.RevisionETLReader
 
getCurrentValue() - Method in class org.hedera.io.input.FileNullInputFormat.FileNullRecordReader
 
getCurrentValue() - Method in class org.hedera.io.input.WikiFullRevisionJsonInputFormat.JsonRevisionReader
 
getCurrentValue() - Method in class org.hedera.io.input.WikiRevisionReader
 
getDiffs() - Method in class org.hedera.io.RevisionDiff
Deprecated.
 
getEpoch() - Method in class com.manning.hip.common.CommonLogEntry
 
getFilePath() - Method in class org.hedera.io.RevisionSplits
 
getFilePath() - Method in class org.hedera.io.SplitUnitOld
Deprecated.
 
getHeaderAnnotations() - Method in class pignlproc.markup.AnnotatingMarkupParser
 
getHeaders() - Method in class pignlproc.markup.AnnotatingMarkupParser
 
getHosts() - Method in class org.hedera.io.SplitUnitOld
Deprecated.
 
getInputFormat() - Method in class org.hedera.pig.load.ClueWeb09WarcLoader
 
getInputFormat() - Method in class org.hedera.pig.load.FileNameLoader
 
getInputFormat() - Method in class org.hedera.pig.load.LiteWikipediaLoader
 
getInputFormat() - Method in class org.hedera.pig.load.TimeseriesLoader
 
getInputFormat() - Method in class org.hedera.pig.load.WikiPageLoadTest
 
getInputFormat() - Method in class org.hedera.pig.load.WikiRevisionFullTextFilter
 
getInputFormat() - Method in class org.hedera.pig.load.WikiRevisionLoader
 
getInputFormat() - Method in class org.hedera.pig.load.WikiRevisionLoaderTest
 
getInputFormat() - Method in class org.hedera.pig.load.WikiRevisionPairLoader
 
getInstance(Path, long, long, FileSystem, CompressionCodecFactory) - Static method in class org.hedera.util.SeekableInputStream
 
getInstance(FileSplit, FileSystem, CompressionCodecFactory) - Static method in class org.hedera.util.SeekableInputStream
 
getLastRevision() - Method in class org.hedera.io.RevisionConcatText
 
getLastRevisionId() - Method in class org.hedera.io.RevisionBOW
 
getLastTimestamp() - Method in class org.hedera.io.RevisionBOW
 
getLastUnmatchPos() - Method in class org.hedera.util.ByteMatcher
 
getLength() - Method in class org.hedera.io.RevisionHeader
 
getLength() - Method in class org.hedera.io.SplitUnitOld
Deprecated.
 
getLinks() - Method in class org.hedera.io.LinkProfile
 
getMethod() - Method in class com.manning.hip.common.CommonLogEntry
 
getNamespace() - Method in class org.hedera.io.RevisionBOW
 
getNamespace() - Method in class org.hedera.io.RevisionHeader
 
getNext() - Method in class org.hedera.pig.load.ClueWeb09WarcLoader
 
getNext() - Method in class org.hedera.pig.load.FileNameLoader
 
getNext() - Method in class org.hedera.pig.load.LiteWikipediaLoader
 
getNext() - Method in class org.hedera.pig.load.TimeseriesLoader
 
getNext() - Method in class org.hedera.pig.load.WikiPageLoadTest
 
getNext() - Method in class org.hedera.pig.load.WikiRevisionFullTextFilter
 
getNext() - Method in class org.hedera.pig.load.WikiRevisionLoader
 
getNext() - Method in class org.hedera.pig.load.WikiRevisionLoaderTest
 
getNext() - Method in class org.hedera.pig.load.WikiRevisionPairLoader
 
getObjSize() - Method in class com.manning.hip.common.CommonLogEntry
 
getPageId() - Method in class org.hedera.io.RevisionBOW
 
getPageId() - Method in class org.hedera.io.RevisionHeader
 
getPageTitle() - Method in class org.hedera.io.RevisionHeader
 
getParagraphAnnotations() - Method in class pignlproc.markup.AnnotatingMarkupParser
 
getParagraphs() - Method in class pignlproc.markup.AnnotatingMarkupParser
 
getParentId() - Method in class org.hedera.io.RevisionHeader
 
getPartitionKeys(String, Job) - Method in class org.hedera.pig.load.ClueWeb09WarcLoader
 
getPartitionKeys(String, Job) - Method in class org.hedera.pig.load.FileNameLoader
 
getPartitionKeys(String, Job) - Method in class org.hedera.pig.load.LiteWikipediaLoader
 
getPartitionKeys(String, Job) - Method in class org.hedera.pig.load.TimeseriesLoader
 
getPartitionKeys(String, Job) - Method in class org.hedera.pig.load.WikiPageLoadTest
 
getPartitionKeys(String, Job) - Method in class org.hedera.pig.load.WikiRevisionFullTextFilter
 
getPartitionKeys(String, Job) - Method in class org.hedera.pig.load.WikiRevisionLoader
 
getPartitionKeys(String, Job) - Method in class org.hedera.pig.load.WikiRevisionLoaderTest
 
getPartitionKeys(String, Job) - Method in class org.hedera.pig.load.WikiRevisionPairLoader
 
getPos() - Method in class org.hedera.util.ByteMatcher
 
getPos() - Method in class org.hedera.util.SeekableInputStream
 
getProgress() - Method in class com.manning.hip.common.CommonLogInputFormat.CommonLogRecordReader
 
getProgress() - Method in class org.hedera.io.etl.RevisionETLReader
 
getProgress() - Method in class org.hedera.io.input.FileNullInputFormat.FileNullRecordReader
 
getProgress() - Method in class org.hedera.io.input.WikiFullRevisionJsonInputFormat.JsonRevisionReader
 
getProgress() - Method in class org.hedera.io.input.WikiRevisionReader
 
getProtocol() - Method in class com.manning.hip.common.CommonLogEntry
 
getReadBytes() - Method in class org.hedera.util.ByteMatcher
 
getRedirect() - Method in class pignlproc.markup.AnnotatingMarkupParser
 
getRemoteAddress() - Method in class com.manning.hip.common.CommonLogEntry
 
getRemoteLogname() - Method in class com.manning.hip.common.CommonLogEntry
 
getRequestLine() - Method in class com.manning.hip.common.CommonLogEntry
 
getResource() - Method in class com.manning.hip.common.CommonLogEntry
 
getRevisionId() - Method in class org.hedera.io.RevisionBOW
 
getRevisionId() - Method in class org.hedera.io.RevisionHeader
 
getSchema(String, Job) - Method in class org.hedera.pig.load.ClueWeb09WarcLoader
 
getSchema(String, Job) - Method in class org.hedera.pig.load.FileNameLoader
 
getSchema(String, Job) - Method in class org.hedera.pig.load.LiteWikipediaLoader
 
getSchema(String, Job) - Method in class org.hedera.pig.load.TimeseriesLoader
 
getSchema(String, Job) - Method in class org.hedera.pig.load.WikiPageLoadTest
 
getSchema(String, Job) - Method in class org.hedera.pig.load.WikiRevisionFullTextFilter
 
getSchema(String, Job) - Method in class org.hedera.pig.load.WikiRevisionLoader
 
getSchema(String, Job) - Method in class org.hedera.pig.load.WikiRevisionLoaderTest
 
getSchema(String, Job) - Method in class org.hedera.pig.load.WikiRevisionPairLoader
 
getSplitCompressionInputStream() - Method in class org.hedera.util.SeekableInputStream
 
getSplits(JobContext) - Method in class org.hedera.io.input.WikiRevisionInputFormat
This code is copied from StreamWikiDumpNewInputFormat.java by Yusuke Matsubara.
getSplits(JobContext, FileStatus, long) - Method in class org.hedera.io.input.WikiRevisionInputFormat
This code is copied from StreamWikiDumpNewInputFormat.java by Yusuke Matsubara.
getStart() - Method in class org.hedera.io.SplitUnitOld
Deprecated.
 
getStatistics(String, Job) - Method in class org.hedera.pig.load.ClueWeb09WarcLoader
 
getStatistics(String, Job) - Method in class org.hedera.pig.load.FileNameLoader
 
getStatistics(String, Job) - Method in class org.hedera.pig.load.LiteWikipediaLoader
 
getStatistics(String, Job) - Method in class org.hedera.pig.load.TimeseriesLoader
 
getStatistics(String, Job) - Method in class org.hedera.pig.load.WikiPageLoadTest
 
getStatistics(String, Job) - Method in class org.hedera.pig.load.WikiRevisionFullTextFilter
 
getStatistics(String, Job) - Method in class org.hedera.pig.load.WikiRevisionLoader
 
getStatistics(String, Job) - Method in class org.hedera.pig.load.WikiRevisionLoaderTest
 
getStatistics(String, Job) - Method in class org.hedera.pig.load.WikiRevisionPairLoader
 
getStatusCode() - Method in class com.manning.hip.common.CommonLogEntry
 
getTarget() - Method in class org.hedera.io.LinkProfile.Link
 
getTarget() - Method in class org.hedera.mapreduce.ExtractTemporalAnchorText.Link
 
getTaskAttemptContext() - Method in class org.hedera.io.etl.RevisionETLReader
 
getTaskAttemptID(TaskAttemptContext) - Static method in class com.manning.hip.common.HadoopCompat
returns TaskAttemptContext.getTaskAttemptID().
getText() - Method in class org.hedera.io.Revision
 
getTime() - Method in class com.manning.hip.common.CommonLogEntry
 
getTimestamp() - Method in class org.hedera.io.RevisionBOW
 
getTimestamp() - Method in class org.hedera.io.RevisionHeader
 
getUser() - Method in class org.hedera.io.FullRevision
 
getUserId() - Method in class com.manning.hip.common.CommonLogEntry
 
getUserId() - Method in class org.hedera.io.FullRevision
 
getWikiLinkAnnotations() - Method in class pignlproc.markup.AnnotatingMarkupParser
 
getWords() - Method in class org.hedera.io.RevisionBOW
 

H

HADOOP_SPLIT_OPTION - Static variable in class org.hedera.mapreduce.IndexSplits
 
HadoopCompat - Class in com.manning.hip.common
Utility methods to allow applications to deal with inconsistencies between MapReduce Context Objects API between Hadoop 1.x and 2.x.
HadoopCompat() - Constructor for class com.manning.hip.common.HadoopCompat
 
hasData() - Method in class org.hedera.io.etl.RevisionETLReader
Check whether there are still data to read
hashCode() - Method in class org.hedera.io.RevisionHeader
 
headers - Variable in class pignlproc.markup.AnnotatingMarkupParser
 
HEADING_TAGS - Static variable in class pignlproc.markup.AnnotatingMarkupParser
 
HOUR_SCALE_OPT - Static variable in class org.hedera.io.etl.IntervalRevisionETLReader
 
HREF_ATTR_KEY - Static variable in class pignlproc.markup.AnnotatingMarkupParser
 

I

imageNodeToText(TagNode, ImageFormat, Appendable, IWikiModel) - Method in class pignlproc.markup.AnnotatingMarkupParser
 
incrementCounter(Counter, long) - Static method in class com.manning.hip.common.HadoopCompat
Increment the counter.
IndexSplits - Class in org.hedera.mapreduce
This tool parses the list of dump files for Wikipedia revision and performs the splitting, then repacks the splits into a sequence file.
IndexSplits() - Constructor for class org.hedera.mapreduce.IndexSplits
 
INITIAL_READ_SIZE - Static variable in class com.manning.hip.common.ApacheCommonLogParser
 
initialize(InputSplit, TaskAttemptContext) - Method in class com.manning.hip.common.CommonLogInputFormat.CommonLogRecordReader
 
initialize(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.etl.DefaultRevisionETLReader
 
initialize(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.etl.IntervalRevisionETLReader
 
initialize(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.etl.RevisionBOWInputFormat.RevisionBOWReader
 
initialize(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.etl.RevisionConcatInputFormat.IntervalRevisionConcatTextReader
 
initialize(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.etl.RevisionETLReader
 
initialize(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.FileNullInputFormat.FileNullRecordReader
 
initialize(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiFullRevisionJsonInputFormat.JsonRevisionReader
 
initialize(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiRevisionDiffInputFormat.DiffReader
 
initialize(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiRevisionFullInputFormat.RevisionReader
 
initialize(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiRevisionHeaderInputFormat.RevisionReader
 
initialize(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiRevisionPageInputFormat.RevisionReader
 
initialize(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiRevisionPairInputFormat.RevisionReader
 
initialize(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiRevisionReader
 
initialize(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiRevisionTextInputFormat.RevisionReader
 
initialize(InputSplit, TaskAttemptContext) - Method in class org.hedera.io.input.WikiRevisionTimeInputFormat.RevisionReader
 
initializeExtractor() - Method in class org.hedera.io.etl.RevisionBOWInputFormat.RevisionBOWReader
 
initializeExtractor() - Method in class org.hedera.io.etl.RevisionConcatInputFormat.IntervalRevisionConcatTextReader
 
initializeExtractor() - Method in class org.hedera.io.etl.RevisionETLReader
 
initializeExtractor() - Method in class org.hedera.io.etl.RevisionIdsFormat.RevisionIdsReader
 
initializeExtractor() - Method in class org.hedera.io.etl.RevisionLinkInputFormat.RevisionLinkReader
 
initializeKey() - Method in class org.hedera.io.etl.RevisionBOWInputFormat.RevisionBOWReader
 
initializeKey() - Method in class org.hedera.io.etl.RevisionConcatInputFormat.IntervalRevisionConcatTextReader
 
initializeKey() - Method in class org.hedera.io.etl.RevisionETLReader
 
initializeKey() - Method in class org.hedera.io.etl.RevisionIdsFormat.RevisionIdsReader
 
initializeKey() - Method in class org.hedera.io.etl.RevisionLinkInputFormat.RevisionLinkReader
 
initializeMeta() - Method in class org.hedera.io.etl.DefaultRevisionETLReader
 
initializeMeta() - Method in class org.hedera.io.etl.RevisionETLReader
 
initializeValue() - Method in class org.hedera.io.etl.RevisionBOWInputFormat.RevisionBOWReader
 
initializeValue() - Method in class org.hedera.io.etl.RevisionConcatInputFormat.IntervalRevisionConcatTextReader
 
initializeValue() - Method in class org.hedera.io.etl.RevisionETLReader
 
initializeValue() - Method in class org.hedera.io.etl.RevisionIdsFormat.RevisionIdsReader
 
initializeValue() - Method in class org.hedera.io.etl.RevisionLinkInputFormat.RevisionLinkReader
 
INPUT_OPTION - Static variable in class org.hedera.mapreduce.BasicComputeTermStats
 
INPUT_OPTION - Static variable in class org.hedera.mapreduce.BuildDictionary
 
INPUT_OPTION - Static variable in class org.hedera.mapreduce.BuildPForDocVectors
 
INPUT_OPTION - Static variable in class org.hedera.mapreduce.BuildVByteDocVectors
 
INPUT_OPTION - Static variable in class org.hedera.mapreduce.IndexSplits
 
INPUT_OPTION - Static variable in class org.hedera.mapreduce.SampleRevisionPair
 
INPUT_OPTION - Static variable in class org.hedera.mapreduce.TestFileNullInputFormat
 
INPUT_OPTION - Static variable in class org.hedera.util.VectorizeAnchorMap
 
INPUT_TYPE_OPTION - Static variable in class org.hedera.mapreduce.IndexSplits
 
IntervalRevisionETLReader<KEYIN,VALUEIN> - Class in org.hedera.io.etl
A WikiRevsionETLReader that skips all revisions out of a specific range
IntervalRevisionETLReader() - Constructor for class org.hedera.io.etl.IntervalRevisionETLReader
 
INTERWIKI_PATTERN - Static variable in class pignlproc.markup.AnnotatingMarkupParser
 
InvertedIndexByBOW - Class in org.hedera.mapreduce
Build a simple inverted index from revisions' bag-of-words.
InvertedIndexByBOW() - Constructor for class org.hedera.mapreduce.InvertedIndexByBOW
 
isAllWhiteSpace(CharSequence) - Method in class com.manning.hip.common.ApacheCommonLogParser
precondition: sb.length() > 0
isMinor() - Method in class org.hedera.io.RevisionHeader
 
isNextCharacterEscapable(String, boolean, int) - Method in class com.manning.hip.common.ApacheCommonLogParser
precondition: the current character is an escape
isNull(String) - Static method in class com.manning.hip.common.ApacheCommonLogReader
 
isPending() - Method in class com.manning.hip.common.ApacheCommonLogParser
 
isSplitable(JobContext, Path) - Method in class com.manning.hip.common.CommonLogInputFormat
 
isSplitable(JobContext, Path) - Method in class org.hedera.io.etl.RevisionIdsFormat
 
isSplitable(JobContext, Path) - Method in class org.hedera.io.input.FileNullInputFormat
 
isSplitable(JobContext, Path) - Method in class org.hedera.io.input.WikiFullRevisionJsonInputFormat
 
isSplitable(JobContext, Path) - Method in class org.hedera.io.input.WikiRevisionHeaderInputFormat
 
isSplitable(JobContext, Path) - Method in class org.hedera.io.input.WikiRevisionInputFormat
 
isVersion2x() - Static method in class com.manning.hip.common.HadoopCompat
True if runtime Hadoop version is 2.x, false otherwise.

J

JobHelper - Class in com.manning.hip.common
 
JobHelper() - Constructor for class com.manning.hip.common.JobHelper
 

K

key - Variable in class org.hedera.io.input.WikiRevisionReader
 
KEY_SKIP_FACTOR - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
KEY_SKIP_FACTOR - Static variable in class org.hedera.mapreduce.IndexSplits
I keep the parameter name to honour Matsubara
keyBuf - Variable in class org.hedera.io.input.WikiRevisionReader
 

L

label - Variable in class pignlproc.markup.Annotation
 
languageCode - Variable in class pignlproc.markup.AnnotatingMarkupParser
 
LinkEntityDocAnnot - Class in org.hedera.graph
This tools builds the (entity, doc) mapping based on explicit links 1.
LinkEntityDocAnnot() - Constructor for class org.hedera.graph.LinkEntityDocAnnot
 
LinkProfile - Class in org.hedera.io
This object represents the outlink profile of a Wikipedia page at a specific moment
LinkProfile() - Constructor for class org.hedera.io.LinkProfile
 
LinkProfile.Link - Class in org.hedera.io
 
LinkProfile.Link(String, String) - Constructor for class org.hedera.io.LinkProfile.Link
 
LiteWikipediaLoader - Class in org.hedera.pig.load
This is a UDF loader that pipelines records in Wikipedia XML dump to Pig tuple, using WikipediaPageInputFormat.
LiteWikipediaLoader() - Constructor for class org.hedera.pig.load.LiteWikipediaLoader
 
loadContributor(String) - Method in class org.hedera.io.FullRevision
 
loadText(byte[], int, int) - Method in class org.hedera.io.Revision
 
loadText(byte[], int, int) - Method in class org.hedera.io.RevisionConcatText
 
log - Static variable in class com.manning.hip.common.JobHelper
 
LOG - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 

M

main(String...) - Static method in class com.manning.hip.common.CatHdfs
 
main(String...) - Static method in class com.manning.hip.common.StreamToHdfs
 
main(String[]) - Static method in class org.hedera.graph.LinkEntityDocAnnot
 
main(String[]) - Static method in class org.hedera.mapreduce.BasicComputeTermStats
Dispatches command-line arguments to the tool via the ToolRunner.
main(String[]) - Static method in class org.hedera.mapreduce.BuildDictionary
Dispatches command-line arguments to the tool via the ToolRunner.
main(String[]) - Static method in class org.hedera.mapreduce.BuildPForDocVectors
Dispatches command-line arguments to the tool via the ToolRunner.
main(String[]) - Static method in class org.hedera.mapreduce.BuildVByteDocVectors
Dispatches command-line arguments to the tool via the ToolRunner.
main(String[]) - Static method in class org.hedera.mapreduce.ExtractRevisionIds
 
main(String[]) - Static method in class org.hedera.mapreduce.ExtractTemporalAnchorText
 
main(String[]) - Static method in class org.hedera.mapreduce.FastExtractTemporalAnchorText
 
main(String[]) - Static method in class org.hedera.mapreduce.IndexSplits
 
main(String[]) - Static method in class org.hedera.mapreduce.InvertedIndexByBOW
 
main(String[]) - Static method in class org.hedera.mapreduce.SampleRevisionPair
 
main(String[]) - Static method in class org.hedera.mapreduce.TestFileNullInputFormat
 
main(String[]) - Static method in class org.hedera.mapreduce.TestWikipediaPageInputFormat
 
main(String[]) - Static method in class org.hedera.mapreduce.WikiRevIndex4NonTemporalSearch
 
main(String[]) - Static method in class org.hedera.mapreduce.WikiRevLength
 
main(String[]) - Static method in class org.hedera.util.VectorizeAnchorMap
 
makeSplit(Path, long, long, String[]) - Method in class org.hedera.io.input.WikiRevisionInputFormat
Tuan, Tu (22.05.2014) - For some reasons, the Pig version in the Hadoop@L3S does not recognize this method in FileInputFormat.
makeWikiModel(String) - Method in class pignlproc.markup.AnnotatingMarkupParser
 
maxTime - Variable in class org.hedera.io.input.WikiFullRevisionJsonInputFormat.JsonRevisionReader
 
maxTime - Variable in class org.hedera.io.input.WikiRevisionReader
 
MediaWikiProcessor - Class in org.hedera.util
Utility for handling mediaWiki syntax
MediaWikiProcessor() - Constructor for class org.hedera.util.MediaWikiProcessor
 
MINOR_TAG - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
minTime - Variable in class org.hedera.io.input.WikiFullRevisionJsonInputFormat.JsonRevisionReader
 
minTime - Variable in class org.hedera.io.input.WikiRevisionReader
 
model - Variable in class pignlproc.markup.AnnotatingMarkupParser
 
MONTH_SCALE_OPT - Static variable in class org.hedera.io.etl.IntervalRevisionETLReader
 

N

newGenericCounter(String, String, long) - Static method in class com.manning.hip.common.HadoopCompat
 
newJobContext(Configuration, JobID) - Static method in class com.manning.hip.common.HadoopCompat
Creates JobContext from a JobConf and jobId using the correct constructor for based on Hadoop version.
newMapContext(Configuration, TaskAttemptID, RecordReader, RecordWriter, OutputCommitter, StatusReporter, InputSplit) - Static method in class com.manning.hip.common.HadoopCompat
Instantiates MapContext under Hadoop 1 and MapContextImpl under Hadoop 2.
newRow() - Method in class com.manning.hip.common.PaddedTable
 
newTaskAttemptContext(Configuration, TaskAttemptID) - Static method in class com.manning.hip.common.HadoopCompat
Creates TaskAttempContext from a JobConf and jobId using the correct constructor for based on Hadoop version.
nextByte() - Method in class org.hedera.io.etl.RevisionETLReader
Get the next byte in the stream and move the cursor forward
nextKeyValue() - Method in class com.manning.hip.common.CommonLogInputFormat.CommonLogRecordReader
 
nextKeyValue() - Method in class org.hedera.io.etl.RevisionETLReader
 
nextKeyValue() - Method in class org.hedera.io.input.FileNullInputFormat.FileNullRecordReader
 
nextKeyValue() - Method in class org.hedera.io.input.WikiFullRevisionJsonInputFormat.JsonRevisionReader
 
nextKeyValue() - Method in class org.hedera.io.input.WikiRevisionReader
 
nodesToText(List<? extends Object>, Appendable, IWikiModel) - Method in class pignlproc.markup.AnnotatingMarkupParser
 
noLinks() - Method in class pignlproc.markup.AnnotatingMarkupParser
 
NULL_CHARACTER - Static variable in class com.manning.hip.common.ApacheCommonLogParser
This is the "null" character - if a value is set to this then it is ignored.

O

OneDayMore - Class in org.hedera.pig.eval
A simple utils that accepts a day and return one day after of format "YYYYmmDD"
OneDayMore() - Constructor for class org.hedera.pig.eval.OneDayMore
 
opt2byte(Delta.TYPE) - Static method in class org.hedera.io.RevisionDiff
Deprecated.
 
org.hedera.graph - package org.hedera.graph
 
org.hedera.io - package org.hedera.io
 
org.hedera.io.etl - package org.hedera.io.etl
 
org.hedera.io.input - package org.hedera.io.input
 
org.hedera.mapreduce - package org.hedera.mapreduce
 
org.hedera.pig.eval - package org.hedera.pig.eval
 
org.hedera.pig.eval.wikipedia - package org.hedera.pig.eval.wikipedia
 
org.hedera.pig.load - package org.hedera.pig.load
 
org.hedera.util - package org.hedera.util
 
OUTPUT_OPTION - Static variable in class org.hedera.mapreduce.BasicComputeTermStats
 
OUTPUT_OPTION - Static variable in class org.hedera.mapreduce.BuildDictionary
 
OUTPUT_OPTION - Static variable in class org.hedera.mapreduce.BuildPForDocVectors
 
OUTPUT_OPTION - Static variable in class org.hedera.mapreduce.BuildVByteDocVectors
 
OUTPUT_OPTION - Static variable in class org.hedera.mapreduce.IndexSplits
 
OUTPUT_OPTION - Static variable in class org.hedera.mapreduce.SampleRevisionPair
 
OUTPUT_OPTION - Static variable in class org.hedera.mapreduce.TestFileNullInputFormat
 
OUTPUT_OPTION - Static variable in class org.hedera.util.VectorizeAnchorMap
 
outputSchema(Schema) - Method in class org.hedera.pig.eval.OneDayMore
 
outputSchema(Schema) - Method in class org.hedera.pig.eval.UnixToElasticTime
 
outputSchema(Schema) - Method in class org.hedera.pig.eval.UnixToYYYYMMdd
 
outputSchema(Schema) - Method in class org.hedera.pig.eval.wikipedia.ExtractTemplate
 
outputSchema(Schema) - Method in class org.hedera.pig.eval.YYYYMMddHHToYYYYMMdd
 

P

PaddedTable - Class in com.manning.hip.common
 
PaddedTable() - Constructor for class com.manning.hip.common.PaddedTable
 
PaddedTable(int) - Constructor for class com.manning.hip.common.PaddedTable
 
PageFunc<T> - Class in org.hedera.pig.eval
A custom eval function that wraps the Pig's eval func and allows user to handle the raw content of page that has id, title and content
PageFunc() - Constructor for class org.hedera.pig.eval.PageFunc
 
PARAGRAPH_TAGS - Static variable in class pignlproc.markup.AnnotatingMarkupParser
 
paragraphs - Variable in class pignlproc.markup.AnnotatingMarkupParser
 
parse(long, String, String) - Method in class org.hedera.pig.eval.PageFunc
 
parse(long, String, String) - Method in class org.hedera.pig.eval.wikipedia.ExtractTemplate
 
parse(String) - Method in class pignlproc.markup.AnnotatingMarkupParser
Convert WikiMarkup to a simple text representation suitable for NLP analysis.
parseLine(String) - Method in class com.manning.hip.common.ApacheCommonLogParser
 
parseLineMulti(String) - Method in class com.manning.hip.common.ApacheCommonLogParser
 
pignlproc.markup - package pignlproc.markup
 
pos - Variable in class org.hedera.io.input.WikiRevisionReader
 
prepareToRead(RecordReader, PigSplit) - Method in class org.hedera.pig.load.ClueWeb09WarcLoader
 
prepareToRead(RecordReader, PigSplit) - Method in class org.hedera.pig.load.FileNameLoader
 
prepareToRead(RecordReader, PigSplit) - Method in class org.hedera.pig.load.LiteWikipediaLoader
 
prepareToRead(RecordReader, PigSplit) - Method in class org.hedera.pig.load.TimeseriesLoader
 
prepareToRead(RecordReader, PigSplit) - Method in class org.hedera.pig.load.WikiPageLoadTest
 
prepareToRead(RecordReader, PigSplit) - Method in class org.hedera.pig.load.WikiRevisionFullTextFilter
 
prepareToRead(RecordReader, PigSplit) - Method in class org.hedera.pig.load.WikiRevisionLoader
 
prepareToRead(RecordReader, PigSplit) - Method in class org.hedera.pig.load.WikiRevisionLoaderTest
 
prepareToRead(RecordReader, PigSplit) - Method in class org.hedera.pig.load.WikiRevisionPairLoader
 
PREPROCESSING - Static variable in class org.hedera.mapreduce.BasicComputeTermStats
 
PREPROCESSING - Static variable in class org.hedera.mapreduce.BuildPForDocVectors
 
PREPROCESSING - Static variable in class org.hedera.mapreduce.BuildVByteDocVectors
 
PREPROCESSING - Static variable in class org.hedera.util.VectorizeAnchorMap
 
processMetaData(DataOutputBuffer, RevisionHeader) - Method in class org.hedera.io.etl.IntervalRevisionETLReader
This method processes after caching the currently visited revision.
processMetaData(DataOutputBuffer, RevisionHeader) - Method in class org.hedera.io.etl.RevisionConcatInputFormat.IntervalRevisionConcatTextReader
 
progress() - Method in class com.twitter.elephantbird.util.TaskHeartbeatThread
This will be called once every periodMillis.
ProxEntityActionAnnot - Class in org.hedera.graph
This tool builds the annotation of entities for each document vector based on the proximity heuristics - Input: anchor / entity mapping, seed entities, proximity distance, entity repacked id list, repacked doc id list, dictionary - Output: csv file [entity ID] TAB [word ID] TAB [timestamp] - For every doc vector, if the anchor is found, then all target entities are annotated for the doc - If the seed word is specified and the proximity window is known, the
ProxEntityActionAnnot() - Constructor for class org.hedera.graph.ProxEntityActionAnnot
 

R

Range<T extends Comparable> - Class in com.manning.hip.common
 
Range(T, T) - Constructor for class com.manning.hip.common.Range
 
reader - Variable in class org.hedera.pig.load.ClueWeb09WarcLoader
 
reader - Variable in class org.hedera.pig.load.FileNameLoader
 
reader - Variable in class org.hedera.pig.load.LiteWikipediaLoader
 
reader - Variable in class org.hedera.pig.load.TimeseriesLoader
 
reader - Variable in class org.hedera.pig.load.WikiPageLoadTest
 
reader - Variable in class org.hedera.pig.load.WikiRevisionLoader
 
reader - Variable in class org.hedera.pig.load.WikiRevisionLoaderTest
 
reader - Variable in class org.hedera.pig.load.WikiRevisionPairLoader
 
readFields(DataInput) - Method in class com.manning.hip.common.CommonLogEntry
 
readFields(DataInput) - Method in class org.hedera.io.FullRevision
 
readFields(DataInput) - Method in class org.hedera.io.LinkProfile
 
readFields(DataInput) - Method in class org.hedera.io.Revision
 
readFields(DataInput) - Method in class org.hedera.io.RevisionBOW
 
readFields(DataInput) - Method in class org.hedera.io.RevisionConcatText
 
readFields(DataInput) - Method in class org.hedera.io.RevisionDiff
Deprecated.
 
readFields(DataInput) - Method in class org.hedera.io.RevisionHeader
 
readFields(DataInput) - Method in class org.hedera.io.RevisionSplits
 
readFields(DataInput) - Method in class org.hedera.io.SplitUnitOld
Deprecated.
 
readLong(DataInput) - Static method in class com.manning.hip.common.CommonLogEntry
 
readToNextRevision(DataOutputBuffer, RevisionHeader) - Method in class org.hedera.io.etl.IntervalRevisionETLReader
 
readToNextRevision(DataOutputBuffer, META) - Method in class org.hedera.io.etl.RevisionETLReader
This method reads bytes inside the input stream into the buffer until reaching EOF or the revision close tag.
readToNextRevision(DataOutputBuffer, RevisionHeader) - Method in class org.hedera.io.etl.RevisionIdsFormat.RevisionIdsReader
 
readToNextRevision(DataOutputBuffer, RevisionHeader) - Method in class org.hedera.io.etl.RevisionLinkInputFormat.RevisionLinkReader
 
readToPageHeader(RevisionHeader) - Method in class org.hedera.io.etl.DefaultRevisionETLReader
 
readToPageHeader(META) - Method in class org.hedera.io.etl.RevisionETLReader
Consume all the tags from page tag till the first revision tag.
readUntilMatch() - Method in class org.hedera.io.input.WikiRevisionDiffInputFormat.DiffReader
 
readUntilMatch() - Method in class org.hedera.io.input.WikiRevisionFullInputFormat.RevisionReader
 
readUntilMatch() - Method in class org.hedera.io.input.WikiRevisionHeaderInputFormat.RevisionReader
 
readUntilMatch() - Method in class org.hedera.io.input.WikiRevisionPageInputFormat.RevisionReader
 
readUntilMatch() - Method in class org.hedera.io.input.WikiRevisionPairInputFormat.RevisionReader
 
readUntilMatch() - Method in class org.hedera.io.input.WikiRevisionReader
What to do when reading till the next relevant tag
readUntilMatch() - Method in class org.hedera.io.input.WikiRevisionTextInputFormat.RevisionReader
 
readUntilMatch() - Method in class org.hedera.io.input.WikiRevisionTimeInputFormat.RevisionReader
 
readUntilMatch(String, DataOutputBuffer, long, Progressable) - Method in class org.hedera.util.ByteMatcher
Tuan (22.05.2014) - change the visibility of this method to public for being able to read from other packages
RECURSION_LIMIT - Static variable in class pignlproc.markup.AnnotatingMarkupParser
 
redirect - Variable in class pignlproc.markup.AnnotatingMarkupParser
 
REDIRECT_PATTERNS - Static variable in class pignlproc.markup.AnnotatingMarkupParser
 
redirectPattern - Variable in class pignlproc.markup.AnnotatingMarkupParser
 
REDUCERS_OPTION - Static variable in class org.hedera.mapreduce.BuildPForDocVectors
 
REDUCERS_OPTION - Static variable in class org.hedera.mapreduce.BuildVByteDocVectors
 
Revision - Class in org.hedera.io
Provide a data model for one Wikipedia revision that is exchangable within Hadoop settings
Revision() - Constructor for class org.hedera.io.Revision
 
REVISION_BEGIN_TIME - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
REVISION_END_TIME - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
RevisionBOW - Class in org.hedera.io
A revision that
RevisionBOW() - Constructor for class org.hedera.io.RevisionBOW
 
RevisionBOWInputFormat - Class in org.hedera.io.etl
Input format that transforms a set of revisions within one unit interval into a bag of words that appear during the interval
RevisionBOWInputFormat() - Constructor for class org.hedera.io.etl.RevisionBOWInputFormat
 
RevisionBOWInputFormat.RevisionBOWExtractor - Class in org.hedera.io.etl
 
RevisionBOWInputFormat.RevisionBOWExtractor() - Constructor for class org.hedera.io.etl.RevisionBOWInputFormat.RevisionBOWExtractor
 
RevisionBOWInputFormat.RevisionBOWExtractor(List<String>, long[]) - Constructor for class org.hedera.io.etl.RevisionBOWInputFormat.RevisionBOWExtractor
 
RevisionBOWInputFormat.RevisionBOWReader - Class in org.hedera.io.etl
 
RevisionBOWInputFormat.RevisionBOWReader() - Constructor for class org.hedera.io.etl.RevisionBOWInputFormat.RevisionBOWReader
 
RevisionConcatInputFormat - Class in org.hedera.io.etl
 
RevisionConcatInputFormat() - Constructor for class org.hedera.io.etl.RevisionConcatInputFormat
 
RevisionConcatInputFormat.IntervalRevisionConcatTextReader - Class in org.hedera.io.etl
An ETL Reader that reads all revisions of a page within one specific interval, and generates a single Writable object that represents the concatenated bag of words of the page during the entire interval.
RevisionConcatInputFormat.IntervalRevisionConcatTextReader() - Constructor for class org.hedera.io.etl.RevisionConcatInputFormat.IntervalRevisionConcatTextReader
 
RevisionConcatInputFormat.RevisionConcatTextExtractor - Class in org.hedera.io.etl
This extractor compares the last revision with the current one, and updates the patch of the BOW object accordingly
RevisionConcatInputFormat.RevisionConcatTextExtractor() - Constructor for class org.hedera.io.etl.RevisionConcatInputFormat.RevisionConcatTextExtractor
 
RevisionConcatText - Class in org.hedera.io
This represents the concatenated bag of words for a set of Wikipedia revisions belonging to the same page within a specific time interval.
RevisionConcatText() - Constructor for class org.hedera.io.RevisionConcatText
 
RevisionDiff - Class in org.hedera.io
Deprecated.
this object is too cumbersome. Use RevisionDiff instead
RevisionDiff() - Constructor for class org.hedera.io.RevisionDiff
Deprecated.
 
RevisionETLReader<KEYIN,VALUEIN,META extends CloneableObject<META>> - Class in org.hedera.io.etl
 
RevisionETLReader() - Constructor for class org.hedera.io.etl.RevisionETLReader
 
RevisionETLReader.Ack - Enum in org.hedera.io.etl
The acknowledgement signal when invoking one internal consuming method.
RevisionHeader - Class in org.hedera.io
a wikipedia header that provides APIs to access revision meta-data
RevisionHeader() - Constructor for class org.hedera.io.RevisionHeader
 
RevisionIdsFormat - Class in org.hedera.io.etl
 
RevisionIdsFormat() - Constructor for class org.hedera.io.etl.RevisionIdsFormat
 
RevisionIdsFormat.IdExtractor - Class in org.hedera.io.etl
 
RevisionIdsFormat.IdExtractor() - Constructor for class org.hedera.io.etl.RevisionIdsFormat.IdExtractor
 
RevisionIdsFormat.RevisionIdsReader - Class in org.hedera.io.etl
A lightweight ETL Reader that reads through Wikipedia Revision and extracts revision ids, time stamp and page id / title.
RevisionIdsFormat.RevisionIdsReader() - Constructor for class org.hedera.io.etl.RevisionIdsFormat.RevisionIdsReader
 
RevisionLinkInputFormat - Class in org.hedera.io.etl
The input format that supports ETL reading and extract link structures from each revision on the go
RevisionLinkInputFormat() - Constructor for class org.hedera.io.etl.RevisionLinkInputFormat
 
RevisionLinkInputFormat.LinkExtractor - Class in org.hedera.io.etl
 
RevisionLinkInputFormat.LinkExtractor() - Constructor for class org.hedera.io.etl.RevisionLinkInputFormat.LinkExtractor
 
RevisionLinkInputFormat.RevisionLinkReader - Class in org.hedera.io.etl
 
RevisionLinkInputFormat.RevisionLinkReader() - Constructor for class org.hedera.io.etl.RevisionLinkInputFormat.RevisionLinkReader
 
RevisionSplits - Class in org.hedera.io
represents a list of file splits for one single revision dump file
RevisionSplits() - Constructor for class org.hedera.io.RevisionSplits
 
run(String[]) - Method in class org.hedera.graph.LinkEntityDocAnnot
 
run(String[]) - Method in class org.hedera.graph.ProxEntityActionAnnot
 
run(String[]) - Method in class org.hedera.mapreduce.BasicComputeTermStats
Runs this tool.
run(String[]) - Method in class org.hedera.mapreduce.BuildDictionary
Runs this tool.
run(String[]) - Method in class org.hedera.mapreduce.BuildPForDocVectors
Runs this tool.
run(String[]) - Method in class org.hedera.mapreduce.BuildVByteDocVectors
Runs this tool.
run(String[]) - Method in class org.hedera.mapreduce.ExtractRevisionIds
 
run(String[]) - Method in class org.hedera.mapreduce.ExtractTemporalAnchorText
 
run(String[]) - Method in class org.hedera.mapreduce.FastExtractTemporalAnchorText
 
run(String[]) - Method in class org.hedera.mapreduce.IndexSplits
 
run(String[]) - Method in class org.hedera.mapreduce.InvertedIndexByBOW
 
run(String[]) - Method in class org.hedera.mapreduce.SampleRevisionPair
 
run(String[]) - Method in class org.hedera.mapreduce.TestFileNullInputFormat
 
run(String[]) - Method in class org.hedera.mapreduce.TestWikipediaPageInputFormat
 
run(String[]) - Method in class org.hedera.mapreduce.WikiRevIndex4NonTemporalSearch
 
run(String[]) - Method in class org.hedera.mapreduce.WikiRevLength
 
run(String[]) - Method in class org.hedera.util.VectorizeAnchorMap
 

S

SampleRevisionPair - Class in org.hedera.mapreduce
A mini patch that extracts the list of (revisionID, parentID) from the revision sets
SampleRevisionPair() - Constructor for class org.hedera.mapreduce.SampleRevisionPair
 
SCALE_OPT - Static variable in class org.hedera.io.etl.IntervalRevisionETLReader
 
schema - Variable in class org.hedera.pig.load.ClueWeb09WarcLoader
 
schema - Variable in class org.hedera.pig.load.FileNameLoader
 
schema - Variable in class org.hedera.pig.load.LiteWikipediaLoader
 
schema - Variable in class org.hedera.pig.load.TimeseriesLoader
 
schema - Variable in class org.hedera.pig.load.WikiPageLoadTest
 
schema - Variable in class org.hedera.pig.load.WikiRevisionFullTextFilter
 
schema - Variable in class org.hedera.pig.load.WikiRevisionLoader
 
schema - Variable in class org.hedera.pig.load.WikiRevisionLoaderTest
 
schema - Variable in class org.hedera.pig.load.WikiRevisionPairLoader
 
SEED_FILE - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
SEED_OPTION - Static variable in class org.hedera.mapreduce.SampleRevisionPair
 
seek(long) - Method in class org.hedera.util.SeekableInputStream
 
SeekableInputStream - Class in org.hedera.util
 
SeekableInputStream(FSDataInputStream) - Constructor for class org.hedera.util.SeekableInputStream
 
SeekableInputStream(SplitCompressionInputStream) - Constructor for class org.hedera.util.SeekableInputStream
 
SeekableInputStream(CompressionInputStream, FSDataInputStream) - Constructor for class org.hedera.util.SeekableInputStream
 
seekToNewSource(long) - Method in class org.hedera.util.SeekableInputStream
 
separatorChar(char) - Method in class com.manning.hip.common.ApacheCommonLogParser
 
setBlockSize(Configuration) - Static method in class org.hedera.io.etl.RevisionETLReader
 
setBlockSize(Configuration) - Static method in class org.hedera.io.input.WikiRevisionReader
 
setComment(String) - Method in class org.hedera.io.FullRevision
 
setDiffs(LinkedList<Delta>) - Method in class org.hedera.io.RevisionDiff
Deprecated.
 
setEpoch(Long) - Method in class com.manning.hip.common.CommonLogEntry
 
setFilePath(String) - Method in class org.hedera.io.RevisionSplits
 
setFilePath(String) - Method in class org.hedera.io.SplitUnitOld
Deprecated.
 
setHosts(String[]) - Method in class org.hedera.io.SplitUnitOld
Deprecated.
 
setLastRevision(List<String>) - Method in class org.hedera.io.RevisionConcatText
 
setLastRevisionId(long) - Method in class org.hedera.io.RevisionBOW
 
setLastTimestamp(long) - Method in class org.hedera.io.RevisionBOW
 
setLength(int) - Method in class org.hedera.io.RevisionHeader
 
setLength(long) - Method in class org.hedera.io.SplitUnitOld
Deprecated.
 
setLocation(String, Job) - Method in class org.hedera.pig.load.ClueWeb09WarcLoader
 
setLocation(String, Job) - Method in class org.hedera.pig.load.FileNameLoader
 
setLocation(String, Job) - Method in class org.hedera.pig.load.LiteWikipediaLoader
 
setLocation(String, Job) - Method in class org.hedera.pig.load.TimeseriesLoader
 
setLocation(String, Job) - Method in class org.hedera.pig.load.WikiPageLoadTest
 
setLocation(String, Job) - Method in class org.hedera.pig.load.WikiRevisionFullTextFilter
 
setLocation(String, Job) - Method in class org.hedera.pig.load.WikiRevisionLoader
 
setLocation(String, Job) - Method in class org.hedera.pig.load.WikiRevisionLoaderTest
 
setLocation(String, Job) - Method in class org.hedera.pig.load.WikiRevisionPairLoader
 
setMethod(String) - Method in class com.manning.hip.common.CommonLogEntry
 
setMinor(boolean) - Method in class org.hedera.io.RevisionHeader
 
setNamespace(int) - Method in class org.hedera.io.RevisionBOW
 
setNamespace(int) - Method in class org.hedera.io.RevisionHeader
 
setObjSize(Long) - Method in class com.manning.hip.common.CommonLogEntry
 
setPageId(long) - Method in class org.hedera.io.RevisionBOW
 
setPageId(long) - Method in class org.hedera.io.RevisionHeader
 
setPageTitle(String) - Method in class org.hedera.io.RevisionHeader
 
setParentId(long) - Method in class org.hedera.io.RevisionHeader
 
setPartitionFilter(Expression) - Method in class org.hedera.pig.load.ClueWeb09WarcLoader
 
setPartitionFilter(Expression) - Method in class org.hedera.pig.load.FileNameLoader
 
setPartitionFilter(Expression) - Method in class org.hedera.pig.load.LiteWikipediaLoader
 
setPartitionFilter(Expression) - Method in class org.hedera.pig.load.TimeseriesLoader
 
setPartitionFilter(Expression) - Method in class org.hedera.pig.load.WikiPageLoadTest
 
setPartitionFilter(Expression) - Method in class org.hedera.pig.load.WikiRevisionFullTextFilter
 
setPartitionFilter(Expression) - Method in class org.hedera.pig.load.WikiRevisionLoader
 
setPartitionFilter(Expression) - Method in class org.hedera.pig.load.WikiRevisionLoaderTest
 
setPartitionFilter(Expression) - Method in class org.hedera.pig.load.WikiRevisionPairLoader
 
setProtocol(String) - Method in class com.manning.hip.common.CommonLogEntry
 
setRemoteAddress(String) - Method in class com.manning.hip.common.CommonLogEntry
 
setRemoteLogname(String) - Method in class com.manning.hip.common.CommonLogEntry
 
setRequestLine(String) - Method in class com.manning.hip.common.CommonLogEntry
 
setResource(String) - Method in class com.manning.hip.common.CommonLogEntry
 
setRevisionId(long) - Method in class org.hedera.io.RevisionBOW
 
setRevisionId(long) - Method in class org.hedera.io.RevisionHeader
 
setStart(long) - Method in class org.hedera.io.SplitUnitOld
Deprecated.
 
setStatus(TaskAttemptContext, String) - Static method in class com.manning.hip.common.HadoopCompat
Invoke setStatus() on TaskAttemptContext.
setStatusCode(Long) - Method in class com.manning.hip.common.CommonLogEntry
 
setTime(String) - Method in class com.manning.hip.common.CommonLogEntry
 
setTimestamp(long) - Method in class org.hedera.io.RevisionBOW
 
setTimestamp(long) - Method in class org.hedera.io.RevisionHeader
 
setUser(String) - Method in class org.hedera.io.FullRevision
 
setUserId(String) - Method in class com.manning.hip.common.CommonLogEntry
 
setUserId(long) - Method in class org.hedera.io.FullRevision
 
skip(long) - Method in class org.hedera.util.ByteMatcher
 
SKIP_NON_ARTICLES - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
SKIP_REDIRECT - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
skipNonArticles - Variable in class org.hedera.io.etl.DefaultRevisionETLReader
 
skipNonArticles - Variable in class org.hedera.io.input.WikiFullRevisionJsonInputFormat.JsonRevisionReader
 
skipNonArticles - Variable in class org.hedera.io.input.WikiRevisionReader
 
skipped - Variable in class org.hedera.io.input.WikiRevisionReader
 
skipRedirect - Variable in class org.hedera.io.etl.DefaultRevisionETLReader
 
split(int) - Method in class org.hedera.io.RevisionSplits
 
SPLIT_INDEX_OPTION - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
SPLIT_MAPFILE_LOC - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
SPLIT_OPTION - Static variable in class org.hedera.mapreduce.IndexSplits
 
splits() - Method in class org.hedera.io.RevisionSplits
 
SplitUnitOld - Class in org.hedera.io
Deprecated.
please use RevisionSplits instead
SplitUnitOld(String, long, long, String[]) - Constructor for class org.hedera.io.SplitUnitOld
Deprecated.
 
SplitUnitOld() - Constructor for class org.hedera.io.SplitUnitOld
Deprecated.
 
start() - Method in class com.twitter.elephantbird.util.TaskHeartbeatThread
Keep the task alive until TaskHeartbeatThread.stop() is called
start - Variable in class org.hedera.io.input.WikiRevisionReader
A global variable to maintain the state / position of pointers along the reivision files
START_COMMENT - Static variable in class org.hedera.io.input.WikiRevisionFullInputFormat
 
START_CONTRIBUTOR - Static variable in class org.hedera.io.input.WikiRevisionFullInputFormat
 
START_ID - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
START_NAMESPACE - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
START_PAGE - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
START_PAGE_TAG - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
START_PARENT_ID - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
START_REDIRECT - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
START_REVISION - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
START_TEXT - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
START_TIME_OPT - Static variable in class org.hedera.io.etl.IntervalRevisionETLReader
 
START_TIMESTAMP - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
START_TIMESTAMP_TAG - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
START_TITLE - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
stop() - Method in class com.twitter.elephantbird.util.TaskHeartbeatThread
Stop keeping the task alive, make sure to call this when your slow operation is finished.
StreamToHdfs - Class in com.manning.hip.common
 
StreamToHdfs() - Constructor for class com.manning.hip.common.StreamToHdfs
 

T

TaskHeartbeatThread - Class in com.twitter.elephantbird.util
Keeps a task alive during any slow operations
TaskHeartbeatThread(Progressable, long) - Constructor for class com.twitter.elephantbird.util.TaskHeartbeatThread
Creates a new thread to keep a task alive Does not start the thread until you call TaskHeartbeatThread.start()
TaskHeartbeatThread(TaskAttemptContext) - Constructor for class com.twitter.elephantbird.util.TaskHeartbeatThread
Same as #TaskHeartbeatThread(TaskAttemptContext, long) but with a default period of 1 minute
TERMS_DATA - Static variable in class org.hedera.mapreduce.BuildDictionary
 
TERMS_ID_DATA - Static variable in class org.hedera.mapreduce.BuildDictionary
 
TERMS_ID_MAPPING_DATA - Static variable in class org.hedera.mapreduce.BuildDictionary
 
TestFileNullInputFormat - Class in org.hedera.mapreduce
 
TestFileNullInputFormat() - Constructor for class org.hedera.mapreduce.TestFileNullInputFormat
 
TestWikipediaPageInputFormat - Class in org.hedera.mapreduce
Map-Reduce job that tests the WikipediaPageInputFormat
TestWikipediaPageInputFormat() - Constructor for class org.hedera.mapreduce.TestWikipediaPageInputFormat
 
text - Variable in class pignlproc.markup.AnnotatingMarkupParser
 
THRESHOLD - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
TIME_FORMAT - Static variable in class org.hedera.io.input.WikiRevisionInputFormat
 
TIME_SCALE_OPT - Static variable in class org.hedera.io.input.WikiRevisionTimeInputFormat
 
TimeseriesLoader - Class in org.hedera.pig.load
 
TimeseriesLoader() - Constructor for class org.hedera.pig.load.TimeseriesLoader
 
titleToUri(String, String) - Static method in class pignlproc.markup.AnnotatingMarkupParser
 
toIntArrayWritable(IntArrayListWritable, int[], int) - Static method in class org.hedera.util.VectorizeAnchorMap
 
toString() - Method in class com.manning.hip.common.PaddedTable
 
toString() - Method in enum org.hedera.io.input.WikiRevisionTimeInputFormat.TimeScale
 
toString() - Method in class org.hedera.io.LinkProfile.Link
 
toString() - Method in class org.hedera.io.RevisionConcatText
Get the String representation of this BOW.
toString() - Method in class org.hedera.io.RevisionHeader
 
toString() - Method in class org.hedera.mapreduce.ExtractTemporalAnchorText.Link
 
toString() - Method in class org.hedera.util.SeekableInputStream
 
tuples - Variable in class org.hedera.pig.load.ClueWeb09WarcLoader
 
tuples - Variable in class org.hedera.pig.load.FileNameLoader
 
tuples - Variable in class org.hedera.pig.load.LiteWikipediaLoader
 
tuples - Variable in class org.hedera.pig.load.TimeseriesLoader
 
tuples - Variable in class org.hedera.pig.load.WikiPageLoadTest
 
tuples - Variable in class org.hedera.pig.load.WikiRevisionFullTextFilter
 
tuples - Variable in class org.hedera.pig.load.WikiRevisionLoader
 
tuples - Variable in class org.hedera.pig.load.WikiRevisionLoaderTest
 
tuples - Variable in class org.hedera.pig.load.WikiRevisionPairLoader
 

U

UnixToElasticTime - Class in org.hedera.pig.eval
Convert Unix epoch to date time format: YYYY-MM-dd'T'HH:mm:ss
UnixToElasticTime() - Constructor for class org.hedera.pig.eval.UnixToElasticTime
 
UnixToYYYYMMdd - Class in org.hedera.pig.eval
Use Joda Time to convert the unix epoch to 'yyyyMMdd'
UnixToYYYYMMdd() - Constructor for class org.hedera.pig.eval.UnixToYYYYMMdd
 
updateBOW(String) - Method in class org.hedera.io.RevisionBOW
update the bag of words with new word
updateRevision() - Method in class org.hedera.io.etl.RevisionETLReader
 

V

value - Variable in class org.hedera.io.input.WikiRevisionReader
 
value - Variable in class pignlproc.markup.Annotation
 
valueOf(String) - Static method in enum org.hedera.io.etl.RevisionETLReader.Ack
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.hedera.io.input.WikiRevisionReader.STATE
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.hedera.io.input.WikiRevisionTimeInputFormat.TimeScale
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.hedera.mapreduce.BuildDictionary.Terms
Returns the enum constant of this type with the specified name.
values() - Static method in enum org.hedera.io.etl.RevisionETLReader.Ack
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.hedera.io.input.WikiRevisionReader.STATE
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.hedera.io.input.WikiRevisionTimeInputFormat.TimeScale
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.hedera.mapreduce.BuildDictionary.Terms
Returns an array containing the constants of this enum type, in the order they are declared.
VectorizeAnchorMap - Class in org.hedera.util
This tool reads the mapping of Wikipedia entity ID - anchor mapping output from Cloud9, checks the continuous entity IDs and IDs of anchor texts, and repacks everything into continuous id ranges
VectorizeAnchorMap() - Constructor for class org.hedera.util.VectorizeAnchorMap
 

W

WEEK_SCALE_OPT - Static variable in class org.hedera.io.etl.IntervalRevisionETLReader
 
WikiFullRevisionJsonInputFormat - Class in org.hedera.io.input
Provide a converter of Json revisions to FullRevision object The following code is inspired by the source code of Manning book, "Hadoop in Practice", source: https://github.com/alexholmes/hadoop-book
WikiFullRevisionJsonInputFormat() - Constructor for class org.hedera.io.input.WikiFullRevisionJsonInputFormat
 
WikiFullRevisionJsonInputFormat.JsonRevisionReader - Class in org.hedera.io.input
 
WikiFullRevisionJsonInputFormat.JsonRevisionReader() - Constructor for class org.hedera.io.input.WikiFullRevisionJsonInputFormat.JsonRevisionReader
 
WIKILINK_TARGET_ATTR_KEY - Static variable in class pignlproc.markup.AnnotatingMarkupParser
 
WIKILINK_TITLE_ATTR_KEY - Static variable in class pignlproc.markup.AnnotatingMarkupParser
 
wikilinks - Variable in class pignlproc.markup.AnnotatingMarkupParser
 
WIKIOBJECT_ATTR_KEY - Static variable in class pignlproc.markup.AnnotatingMarkupParser
 
WikiPageLoadTest - Class in org.hedera.pig.load
 
WikiPageLoadTest() - Constructor for class org.hedera.pig.load.WikiPageLoadTest
 
WikiRevIndex4NonTemporalSearch - Class in org.hedera.mapreduce
Build a simple inverted index for word-based search without temporal features.
WikiRevIndex4NonTemporalSearch() - Constructor for class org.hedera.mapreduce.WikiRevIndex4NonTemporalSearch
 
WikiRevisionDiffInputFormat - Class in org.hedera.io.input
 
WikiRevisionDiffInputFormat() - Constructor for class org.hedera.io.input.WikiRevisionDiffInputFormat
 
WikiRevisionDiffInputFormat.DiffReader - Class in org.hedera.io.input
Read every pairs of consecutive revisions and calculate their diffs using Meyer's alogirthm.
WikiRevisionDiffInputFormat.DiffReader() - Constructor for class org.hedera.io.input.WikiRevisionDiffInputFormat.DiffReader
 
WikiRevisionFullInputFormat - Class in org.hedera.io.input
 
WikiRevisionFullInputFormat() - Constructor for class org.hedera.io.input.WikiRevisionFullInputFormat
 
WikiRevisionFullInputFormat.RevisionReader - Class in org.hedera.io.input
Read each revision of Wikipedia page and transform into a WikipediaRevision object.
WikiRevisionFullInputFormat.RevisionReader() - Constructor for class org.hedera.io.input.WikiRevisionFullInputFormat.RevisionReader
 
WikiRevisionFullTextFilter - Class in org.hedera.pig.load
A Pig UDF loader that filters wiki revision text by keywords
WikiRevisionFullTextFilter(String) - Constructor for class org.hedera.pig.load.WikiRevisionFullTextFilter
 
WikiRevisionHeaderInputFormat - Class in org.hedera.io.input
This is probably the simplest inputformat: It reads the chunks of dump files and extracts only the headers for each revision.
WikiRevisionHeaderInputFormat() - Constructor for class org.hedera.io.input.WikiRevisionHeaderInputFormat
 
WikiRevisionHeaderInputFormat.RevisionReader - Class in org.hedera.io.input
 
WikiRevisionHeaderInputFormat.RevisionReader() - Constructor for class org.hedera.io.input.WikiRevisionHeaderInputFormat.RevisionReader
 
WikiRevisionInputFormat<KEYIN,VALUEIN> - Class in org.hedera.io.input
A InputFormat implementation that splits a Wikipedia Revision File into page fragments, output them as input records.
WikiRevisionInputFormat() - Constructor for class org.hedera.io.input.WikiRevisionInputFormat
 
WikiRevisionLoader - Class in org.hedera.pig.load
 
WikiRevisionLoader() - Constructor for class org.hedera.pig.load.WikiRevisionLoader
 
WikiRevisionLoaderTest - Class in org.hedera.pig.load
 
WikiRevisionLoaderTest() - Constructor for class org.hedera.pig.load.WikiRevisionLoaderTest
 
WikiRevisionPageInputFormat - Class in org.hedera.io.input
 
WikiRevisionPageInputFormat() - Constructor for class org.hedera.io.input.WikiRevisionPageInputFormat
 
WikiRevisionPageInputFormat.RevisionReader - Class in org.hedera.io.input
Read each revision of Wikipedia page and transform into a WikipediaRevision object
WikiRevisionPageInputFormat.RevisionReader() - Constructor for class org.hedera.io.input.WikiRevisionPageInputFormat.RevisionReader
 
WikiRevisionPairInputFormat - Class in org.hedera.io.input
 
WikiRevisionPairInputFormat() - Constructor for class org.hedera.io.input.WikiRevisionPairInputFormat
 
WikiRevisionPairInputFormat.RevisionReader - Class in org.hedera.io.input
read a meta-history xml file and output as a record every pair of consecutive revisions.
WikiRevisionPairInputFormat.RevisionReader() - Constructor for class org.hedera.io.input.WikiRevisionPairInputFormat.RevisionReader
 
WikiRevisionPairLoader - Class in org.hedera.pig.load
A UDF loader for WikiRevisionPairInputFormat using RevisionPairRecordReader
WikiRevisionPairLoader() - Constructor for class org.hedera.pig.load.WikiRevisionPairLoader
 
WikiRevisionReader<VALUEIN> - Class in org.hedera.io.input
 
WikiRevisionReader() - Constructor for class org.hedera.io.input.WikiRevisionReader
 
WikiRevisionReader.STATE - Enum in org.hedera.io.input
 
WikiRevisionTextInputFormat - Class in org.hedera.io.input
 
WikiRevisionTextInputFormat() - Constructor for class org.hedera.io.input.WikiRevisionTextInputFormat
 
WikiRevisionTextInputFormat.RevisionReader - Class in org.hedera.io.input
read a meta-history xml file and output as a record every pair of consecutive revisions.
WikiRevisionTextInputFormat.RevisionReader() - Constructor for class org.hedera.io.input.WikiRevisionTextInputFormat.RevisionReader
 
WikiRevisionTimeInputFormat - Class in org.hedera.io.input
 
WikiRevisionTimeInputFormat() - Constructor for class org.hedera.io.input.WikiRevisionTimeInputFormat
 
WikiRevisionTimeInputFormat(String) - Constructor for class org.hedera.io.input.WikiRevisionTimeInputFormat
 
WikiRevisionTimeInputFormat.RevisionReader - Class in org.hedera.io.input
 
WikiRevisionTimeInputFormat.RevisionReader() - Constructor for class org.hedera.io.input.WikiRevisionTimeInputFormat.RevisionReader
 
WikiRevisionTimeInputFormat.RevisionReader(WikiRevisionTimeInputFormat.TimeScale) - Constructor for class org.hedera.io.input.WikiRevisionTimeInputFormat.RevisionReader
 
WikiRevisionTimeInputFormat.TimeScale - Enum in org.hedera.io.input
 
WikiRevLength - Class in org.hedera.mapreduce
Generate document length data for Okapi-BM25 scores
WikiRevLength() - Constructor for class org.hedera.mapreduce.WikiRevLength
 
wrappedBuffer - Variable in class pignlproc.markup.AnnotatingMarkupParser.CountingAppendable
 
write(DataOutput) - Method in class com.manning.hip.common.CommonLogEntry
 
write(DataOutput) - Method in class org.hedera.io.FullRevision
 
write(DataOutput) - Method in class org.hedera.io.LinkProfile
 
write(DataOutput) - Method in class org.hedera.io.Revision
 
write(DataOutput) - Method in class org.hedera.io.RevisionBOW
 
write(DataOutput) - Method in class org.hedera.io.RevisionConcatText
 
write(DataOutput) - Method in class org.hedera.io.RevisionDiff
Deprecated.
 
write(DataOutput) - Method in class org.hedera.io.RevisionHeader
 
write(DataOutput) - Method in class org.hedera.io.RevisionSplits
 
write(DataOutput) - Method in class org.hedera.io.SplitUnitOld
Deprecated.
 
writeLong(DataOutput, Long) - Static method in class com.manning.hip.common.CommonLogEntry
 

Y

YYYYMMddHHToYYYYMMdd - Class in org.hedera.pig.eval
Use Jodatime to convert string of format 'YYYYMMddHH' to 'YYYYMMdd'
YYYYMMddHHToYYYYMMdd() - Constructor for class org.hedera.pig.eval.YYYYMMddHHToYYYYMMdd
 
A B C D E F G H I J K L M N O P R S T U V W Y 

Copyright © 2014. All rights reserved.