Computer Science – Information Retrieval
Scientific paper
2008-09-03
Computer Science
Information Retrieval
6 pages, 4 figures, Proceeding of the International Conference on Advanced Computational Intelligence and Its Applications 200
Scientific paper
The focused web-harvesting is deployed to realize an automated and comprehensive index databases as an alternative way for virtual topical data integration. The web-harvesting has been implemented and extended by not only specifying the targeted URLs, but also predefining human-edited harvesting parameters to improve the speed and accuracy. The harvesting parameter set comprises three main components. First, the depth-scale of being harvested final pages containing desired information counted from the first page at the targeted URLs. Secondly, the focus-point number to determine the exact box containing relevant information. Lastly, the combination of keywords to recognize encountered hyperlinks of relevant images or full-texts embedded in those final pages. All parameters are accessible and fully customizable for each target by the administrators of participating institutions over an integrated web interface. A real implementation to the Indonesian Scientific Index which covers all scientific information across Indonesia is also briefly introduced.
Akbar Zaenal
Handoko Laksana Tri
No associations
LandOfFree
A Simple Mechanism for Focused Web-harvesting does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with A Simple Mechanism for Focused Web-harvesting, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and A Simple Mechanism for Focused Web-harvesting will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-141317