Tuesday, August 25, 2009

Standard Generalized Markup Language

Two components:

Content
Markup

Content: Message (text, images, tables, etc.)

Markup: Information conveyed by the document beyond its content (e.g. meta data, font type and size, text positioning, etc.)

Print era – Writing instructions on a physical page to the typesetter regarding how the various parts of the document should be typeset

Markup

Two components:

Structure: Logical breakup of the document (chapters, paragraphs, etc.). Orgnisation of these parts into an hierarchy. Descriptive or Generic markup

Formatting: Presentation of the document, Fonts, page break, etc. Procedural or Presentational markup

Markup Controversy

Which markup is more important and should be given priority? Descriptive or Procedural?

An average user needs formatted documents, not just the structure.

Problems of Procedural Markup

Impedes the use and reuse of the document if not accompanied by structural markup

Ex.: Portion of text set in Times Roman 12 points, left aligned (we do not know if his is title or author, paragraph, etc.)

Exporting to another format becomes difficult

We do not know what each portion of text is supposed to be Exchange of data becomes difficult

Solution

Separate the presentation from structure (content) Preference is given to descriptive markup Content and format can be developed and/or modified by different people

Document Styles in WYSIWYG Word

Processors

Many word processors support document styles, and creation of new styles

Ex.: MS Word

Support association of some descriptive markups with formatting tags

Provide only partial solution – but really do not separate content from presentation

Descriptive Markup - Advantages

Strictly defined set of hierarchical descriptive tags

Ensure that the text can be processed automatically

No need to worry about the formatting aspects

Enables document interchange between different systems

Provide for easy extension and modification - maintainability

Enable mapping into a different set of tags - customisation

SGML

Standard Generalized Markup Language

Strictly descriptive

Contains no means to mark up presentational aspects of documents

Can be easily interfaced to external procedural markup systems and style sheets

SGML…

Not a markup language by itself

A metasystem enabling users to create such systems for particular types of documents Possible to build different markup languages using SGML

HTML is an example

SGML…

Like HTML, SGML is a computer language rather than a data format. SGML files can be created manually, or through SGML editor software tools

SGML Parser

Software that reads and analyzes an SGML document Validation or transformation not much use by itself Part of a bigger SGML application system or browser

SGML History

More than 10 years history of use and growth…

Widely used – aerospace, automotive, defence, software, semiconductor, pharmaceutical, publishing and other industries.

ISO standard (ISO 8879) – adopted by several other standards bodies

SGML: Key Features

Descriptive markup
Document types
Data independence

Descriptive markup

Use of markup codes (names) to categorize parts of a document

Example: to identify a paragraph

Advantage: Same document can be processed by different software for different purposes

Document Type

Notion of ‘document type’ (hence DTD)

Type of a document is formally defined by its constituent parts and their structure – expressed in a tree structure

Example: Report Title, followed by author (optional), abstract, one or more paragraphs

If title is absent, it is not a report

If abstract follows paragraphs, it is not a report

Data Independence

Document portability across different HW and SW environments

How to handle character set differences?

Descriptive mapping for non-portable characters

String substitution mechanism (entities): process time substitution of a particular string of characters by other string of characters

Defining an SGML Application

From SGML view, a document is a hierarchical structure of nested elements (chapters, sections, paragraphs, etc.)

SGML does not specify any presentational aspects of these elements. SGML also does not convey any meaning or role of these elements – meaning is implied by the application.

SGML specifies the contexts and levels of document hierarchy in which an element can or must occur. All documents that can be marked up with the same
hierarchy of elements are said to belong to a certain document type

Defining an SGML Application…

SGML defines the structure of a particular type of documents via the DTD (Document Type Definition)

Some general features of an SGML application are specified in another component called SGML Declaration.

Defining an SGML Application…

SGML Syntax:

SGML statements are enclosed in angle brackets (<>) and contain a keyword or name followed by one or more parameters separated by spaces

Character ‘!’ is inserted between ‘<‘ and the statement keyword
Example: EMPTY –
Embedded image -->

Comments within a
Components of an SGML Document

SGML Declaration:

Character set, syntax (e.g. delimiters), optional features Usually a single declaration is used for all documents under a particular system

Prolog:

Usually a single document type definition (DTD)

Contains rules to which any document of a given type must conform

Document instance:

The document itself, marked up following the SGML usage conventions specified in the SGML declaration and the DTD.

Example 1: Office Memo

Document type: Office memo (Memo)

The tree structure and structural markup are shown. SGML form of this document is shown. SGML has the flexibility to define an infinite set of generic markup languages (articles, books, etc.)

An SGML markup language defines the possible hierarchical structures of documents in this class


A Memo Document
M E M O R A N D U M

To: Comrade Napoleon

From: Snowball

In Animal Farm, George Orwell says: “…the pigs had to expend enormous labour every day upon mysterious things called files, reports, minutes and memoranda. These were large sheets of paper which had to be closely covered with writing, and as soon as they were so covered, they were burnt in the furnace…” Do you think SGML would have helped the pigs?

Comrade Snowball

Memo
To From Body Close
Paragraph
Quotation
Tree Structure of the Memo Document

MEMO The memo itself
TO Recipient(s)
FROM Author(s)
BODY Main text contains paragraphs
P Paragraphs contains text or quotations
Q Quotations contain text
CLOSE Authors signature
Structural markup for memos
Codes



Comrade Napoleon
Snowball

In Animal Farm, George Orwell says: …the pigs had to expend enormous labour every day upon mysterious things called files, reports, minutes and memoranda. These were large sheets of paper which had to be closely covered with writing, and as soon as they were so covered, they were burnt in the furnace…Do you think SGML would have helped the pigs?


Comrade Snowball

SGML form of the memo

DTD for the memo document type







]>
DTD Features

program program
output output
SGML SGML
parser parser
entity entity
manager manager
SGML SGML
declaration declaration
document document
instance instance
processing processing
program program
entry/edit entry/edit
composition composition
etc. etc.
SGML System

Demonstration

SGML Parser

XMETAL – SGML/ XML editor/browser

Sample bibliographic data DTD’s

SGML Applications Examples

Text Encoding Initiative (TEI)

www.uic.edu/orgs/tei/

Encoded Archival Description DTD (EAD)

lcweb.loc.gov/ead/

Electronic These and Dissertations (ETD)

SGML Resources

The SGML/XML Web Page

http://www.oasis-open.org/cover/sgml-xml.html

A Gentle introduction to SGML”

http://www-tei.uic.edu/orgs/tei/sgml/teip3sg/index.html

Berkeley Digital Library on SGML

http://sunsite.berkeley.edu/SGML/

The Whirlwind guide to SGML and XML tools and vendors

http://tosca.infotek.no/sgmltool/guide.htm

No comments: