Solr is only for indexing and searching text, it does not have a crawler since it’s out the project’s scope. However take a look at Nutch, which is a crawler and not too hard to setup initially.

Nutch and Solr can be integrated if you need some Solr-specific feature to search the index.

Installing Solr
o) sudo apt-get install python-setuptools
o) Apache Solr web: download latest tgz sources (3.6.1)
o) cd to apache download directory
o) sudo apt-get install ant
o) sudo apt-get install ivy-bootstrap
o) JDK + JRE (6+) sudo apt-get install openjdk-6-jdk
o) sudo ant compile
o) sudo ant test
o) sudo ant example


Bake your web search with Sunburnt, Solr

Search is an integral functionality of a web project. A high performance, scalable and robust search solution has been a perennial need of developers. Apache Solr is an open source, community driven solution for search implementation with REST APIs. Sunburnt is a Pythonic way to interface Solr.

The problem
• Big Data
• Scalability
• Reliability
• Performance
What is Solr?
• Index
• Search
What it is not
Integrating Solr to your web project
Using Sunburnt to query