Hedera

Hedera is a framework to facilitate the rapid development of processing methods on big versioned document collections using the Hadoop framework. It provides customized InputFormat for accessing data in parallel in standard MapReduce workflows. Hedera can be used in Hadoop Streaming to enable rapid development using different languages (Java, Python, C, etc.) , and it also supports Pig with a number of User-defined functions (UDFs). At the moment, the framework has been tested in Hadoop CDH 4.x and Pig 0.11.x. It supports both Hadoop YARN and non-YARN models. Free for research and educational purpose under GNU and Common Creative License.

Learn more »

(documentation is being progressively updated here.)

Hedera is available under Apache License v2.0 and Creative Commons Contribution v3.0 license.

Hedera uses the following library for working with Wikipedia:

The tool is currently pretty much in active development phase, meaning that it can go through several updates per day, with new functionalities added or modified / improved. I am trying to make it stable every day, but if you find any issues regarding installation, running code, please drop me a message to: ttran (at) L3S (dot) de.