Qne of the difficulties in building an SQL-like query language for the Web is the absence of a database (1) for this huge, heterogeneous repository of information. However, if we are interested in HTML documents only, we can construct a virtual schema from the implicit structure of these files. Thus, at the highest level of (2), every such document is identified by its Uniform Resource Locator (URL), and a(3)and a text. Also, Web severs provide some additional information such as the type, length, and the last modification date of a document. So, for data mining purposes, we can consider the set of all HTML documtnts as a relation:
Document (url, title, text, type, length, modif)
Where all the (4) are character strings. In this framework, an individual document is identified with a (5) in this relation. Of course, if some optional information is missing from the HTML document, the associate fields will be left blank, but this is not uncommon in any database.
(1)A、schema
B、platform
C、module
D、relation
(2)A、protocol
B、control
C、abstraction
D、presentation
(3)A、table
B、title
C、driver
D、event
(4)A、type
B、links
C、characteristics
D、attributes
(5)A、relation
B、field
C、script
D、tuple
10年專(zhuān)注信管,信管教育專(zhuān)注者,信管網(wǎng)優(yōu)勢(shì)
免費(fèi)試聽(tīng)信管網(wǎng)信息系統(tǒng)項(xiàng)目管理師課程
全國(guó)前50名高分學(xué)員訪(fǎng)談:董麗(174)、李思...
信息系統(tǒng)項(xiàng)目管理師高端班培訓(xùn)課程
信管老師100小時(shí)直播課程
軟考報(bào)名專(zhuān)題(報(bào)名時(shí)間、入口等)
中級(jí)系統(tǒng)集成項(xiàng)目管理工程師通關(guān)課程
系統(tǒng)規(guī)劃與管理師課程(考試介紹與題型分析)
軟題庫(kù):軟考在線(xiàn)題庫(kù)、支持手機(jī)答題