Pattern discovery for semi-structured web pages using bar-tree representation

Computer Science – Information Retrieval

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

9 pages

Scientific paper

Many websites with an underlying database containing structured data provide the richest and most dense source of information relevant for topical data integration. The real data integration requires sustainable and reliable pattern discovery to enable accurate content retrieval and to recognize pattern changes from time to time; yet, extracting the structured data from web documents is still lacking from its accuracy. This paper proposes the bar-tree representation to describe the whole pattern of web pages in an efficient way based on the reverse algorithm. While previous algorithms always trace the pattern and extract the region of interest from \textit{top root}, the reverse algorithm recognizes the pattern from the region of interest to both top and bottom roots simultaneously. The attributes are then extracted and labeled reversely from the region of interest of targeted contents. Since using conventional representations for the algorithm should require more computational power, the bar-tree method is developed to represent the generated patterns using bar graphs characterized by the depths and widths from the document roots. We show that this representation is suitable for extracting the data from the semi-structured web sources, and for detecting the template changes of targeted pages. The experimental results show perfect recognition rate for template changes in several web targets.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Pattern discovery for semi-structured web pages using bar-tree representation does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Pattern discovery for semi-structured web pages using bar-tree representation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Pattern discovery for semi-structured web pages using bar-tree representation will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-167101

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.