Nov 10, 2017

Apache Nutch is the most complete, open source crawler that you can find for Java.

Highly extensible, highly scalable Web crawler
Nutch is a well matured, production ready Web crawler. Nutch 1.x enables fine grained configuration, relying on Apache Hadoop™ data structures, which are great for batch processing.
Post a Comment

Featured Post

Blockchain could solve accounting problems and bring an unprecedented level of accuracy, security, and speed to record-keeping.

By adopting it, companies could eliminate supply chain inefficiencies -- and save billions of dollars.