Content
Markup
Content: Message (text, images, tables, etc.)
Markup: Information conveyed by the document beyond its content (e.g. meta data, font type and size, text positioning, etc.)
Print era – Writing instructions on a physical page to the typesetter regarding how the various parts of the document should be typeset
Markup
Two components:
Structure: Logical breakup of the document (chapters, paragraphs, etc.). Orgnisation of these parts into an hierarchy. Descriptive or Generic markup
Formatting: Presentation of the document, Fonts, page break, etc. Procedural or Presentational markup
Markup Controversy
Which markup is more important and should be given priority? Descriptive or Procedural?
An average user needs formatted documents, not just the structure.
Problems of Procedural Markup
Impedes the use and reuse of the document if not accompanied by structural markup
Ex.: Portion of text set in Times Roman 12 points, left aligned (we do not know if his is title or author, paragraph, etc.)
Exporting to another format becomes difficult
We do not know what each portion of text is supposed to be Exchange of data becomes difficult
Solution
Separate the presentation from structure (content) Preference is given to descriptive markup Content and format can be developed and/or modified by different people
Document Styles in WYSIWYG Word
Processors
Many word processors support document styles, and creation of new styles
Ex.: MS Word
Support association of some descriptive markups with formatting tags
Provide only partial solution – but really do not separate content from presentation
Descriptive Markup - Advantages
Strictly defined set of hierarchical descriptive tags
Ensure that the text can be processed automatically
No need to worry about the formatting aspects
Enables document interchange between different systems
Provide for easy extension and modification - maintainability
Enable mapping into a different set of tags - customisation
SGML
Standard Generalized Markup Language
Strictly descriptive
Contains no means to mark up presentational aspects of documents
Can be easily interfaced to external procedural markup systems and style sheets
SGML…
Not a markup language by itself
A metasystem enabling users to create such systems for particular types of documents Possible to build different markup languages using SGML
HTML is an example
SGML…
Like HTML, SGML is a computer language rather than a data format. SGML files can be created manually, or through SGML editor software tools
SGML Parser
Software that reads and analyzes an SGML document Validation or transformation not much use by itself Part of a bigger SGML application system or browser
SGML History
More than 10 years history of use and growth…
Widely used – aerospace, automotive, defence, software, semiconductor, pharmaceutical, publishing and other industries.
ISO standard (ISO 8879) – adopted by several other standards bodies
SGML: Key Features
Descriptive markup
Document types
Data independence
Descriptive markup
Use of markup codes (names) to categorize parts of a document
Example:
Advantage: Same document can be processed by different software for different purposes
Document Type
Notion of ‘document type’ (hence DTD)
Type of a document is formally defined by its constituent parts and their structure – expressed in a tree structure
Example: Report Title, followed by author (optional), abstract, one or more paragraphs
If title is absent, it is not a report
If abstract follows paragraphs, it is not a report
Data Independence
Document portability across different HW and SW environments
How to handle character set differences?
Descriptive mapping for non-portable characters
String substitution mechanism (entities): process time substitution of a particular string of characters by other string of characters
Defining an SGML Application
From SGML view, a document is a hierarchical structure of nested elements (chapters, sections, paragraphs, etc.)
SGML does not specify any presentational aspects of these elements. SGML also does not convey any meaning or role of these elements – meaning is implied by the application.
SGML specifies the contexts and levels of document hierarchy in which an element can or must occur. All documents that can be marked up with the same
hierarchy of elements are said to belong to a certain document type
Defining an SGML Application…
SGML defines the structure of a particular type of documents via the DTD (Document Type Definition)
Some general features of an SGML application are specified in another component called SGML Declaration.
Defining an SGML Application…
SGML Syntax:
SGML statements are enclosed in angle brackets (<>) and contain a keyword or name followed by one or more parameters separated by spaces
Character ‘!’ is inserted between ‘<‘ and the statement keyword
Example: EMPTY –
Embedded image -->
Comments within a
Components of an SGML Document
SGML Declaration:
Character set, syntax (e.g. delimiters), optional features Usually a single declaration is used for all documents under a particular system
Prolog:
Usually a single document type definition (DTD)
Contains rules to which any document of a given type must conform
Document instance:
The document itself, marked up following the SGML usage conventions specified in the SGML declaration and the DTD.
Example 1: Office Memo
Document type: Office memo (Memo)
The tree structure and structural markup are shown. SGML form of this document is shown. SGML has the flexibility to define an infinite set of generic markup languages (articles, books, etc.)
An SGML markup language defines the possible hierarchical structures of documents in this class
A Memo Document
M E M O R A N D U M
To: Comrade Napoleon
From: Snowball
In Animal Farm, George Orwell says: “…the pigs had to expend enormous labour every day upon mysterious things called files, reports, minutes and memoranda. These were large sheets of paper which had to be closely covered with writing, and as soon as they were so covered, they were burnt in the furnace…” Do you think SGML would have helped the pigs?
Comrade Snowball
Memo
To From Body Close
Paragraph
Quotation
Tree Structure of the Memo Document
MEMO The memo itself
TO Recipient(s)
FROM Author(s)
BODY Main text contains paragraphs
P Paragraphs contains text or quotations
Q Quotations contain text
CLOSE Authors signature
Structural markup for memos
Codes
In Animal Farm, George Orwell says: …the pigs had to expend enormous labour every day upon mysterious things called files, reports, minutes and memoranda. These were large sheets of paper which had to be closely covered with writing, and as soon as they were so covered, they were burnt in the furnace…
Do you think SGML would have helped the pigs?
SGML form of the memo
DTD for the memo document type
]>
DTD Features
program program
output output
SGML SGML
parser parser
entity entity
manager manager
SGML SGML
declaration declaration
document document
instance instance
processing processing
program program
entry/edit entry/edit
composition composition
etc. etc.
SGML System
Demonstration
SGML Parser
XMETAL – SGML/ XML editor/browser
Sample bibliographic data DTD’s
SGML Applications Examples
Text Encoding Initiative (TEI)
www.uic.edu/orgs/tei/
Encoded Archival Description DTD (EAD)
lcweb.loc.gov/ead/
Electronic These and Dissertations (ETD)
SGML Resources
The SGML/XML Web Page
http://www.oasis-open.org/cover/sgml-xml.html
A Gentle introduction to SGML”
http://www-tei.uic.edu/orgs/tei/sgml/teip3sg/index.html
Berkeley Digital Library on SGML
http://sunsite.berkeley.edu/SGML/
The Whirlwind guide to SGML and XML tools and vendors
http://tosca.infotek.no/sgmltool/guide.htm
No comments:
Post a Comment