Web Crawling
Web harvesting is commonly used to describe Web scraping from a multitude of sites. It also refers to an implementation of a Web crawler that uses human expertise or machine guidance to direct the crawler to URLs which compose a specialized collection or set of knowledge.

Web harvesting can be thought of as focused or directed Web crawling. Purpose Web harvesting allows Web-based search and retrieval applications, commonly referred to as search engines, to index content that is pertinent to the audience for which the harvest is intended. Such content is thus virtually integrated and made searchable as a separate Web application. General purpose search engines, such as Google and Yahoo! index all possible links they encounter from the origin of their crawl.

Focused web harvesting Focused web harvesting is similar to the targeted web crawler. Instead of let the general purpose crawler to harvest the web, the mechanism works under a certain pre-defined conditions to specify the information. Especially this mechanism is intended to realize an indirect data integration. An implementation of this kind of data integration can be found at the Indonesian Scientific Index which integrates all information related to the science and technology in Indonesia.