Talk:HTML/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

History of HTML (including SGML origins)

(copy of old revision of article and experiments embedding markup examples removed for brevity; see 2001-12-01 revision of Talk:HTML) Why doesn't this work? WHAT??? Grr. I'm off to bed. Message me if you have any ideas. Dave McKee —Preceding unsigned comment added by 194.82.103.xxx (talkcontribs) 1 December 2001


What does not work? While editing the Dutch page about HTML, I discovered the hard way that you should put code examples between PRE (multi-line) or CODE tags.

Also, are you sure HTML started out as an SGML application? IIRC, HTML 1.x was an SGML-like application, with version 2.0 being the first real SGML app. —Preceding unsigned comment added by 194.109.232.xxx (talkcontribs) 17 December 2001


Berners-Lee conceptually based HTML on SGML. The first versions were not conforming implementations of SGML--for good reasons. Specifically, conformance required far more resources than was necessary to get the functionality he needed, so he simplified the syntax considerably. --LDC


Could someone please add some historical info on HTML before September of 1995? -- Infrogmation (who remembers when the leading web browsers were Mosaic and Lynx) —Preceding unsigned comment added by Infrogmation (talkcontribs) 7 February 2003

Valid and not

In 2001 I wrote a thesis named "How to cope with incorrect HTML", available at http://elsewhat.com/thesis . It deals with the HTML standard and how a browser can parse valid and invalid HTML. Perhaps the most interesting part of this thesis is a validation effort of 2.5 million HTML pages (XHTML was not considered at the time) taken from http://dmoz.org. It showed that only 0.71% of all HTML pages were valid, and describes in some details the different errors the pages had. You can jump directly this chapter via http://elsewhat.com/thesis/pages/?nr=81 , which shows each page of the thesis as a separate image. Perhaps there should be a minor mention on the amount of valid html and links on how to validate HTML ? -DagfinnParnas —Preceding unsigned comment added by 143.97.2.35 (talkcontribs) 27 October 2004

HTML tag (actual), etc.

Should the entries HTML tag (actual) and Block level elements even exist? The former is imprecise, ambiguous and only serves to confuse the usage of the term "tag". The latter uses a title that does not restrict its scope to HTML while not even providing very useful information. It's also not even a complete list. —Preceding unsigned comment added by 24.100.55.42 (talkcontribs) 3 January 2005

Restricted HTML?

Some software, including MediaWiki, only allow restricted use of HTML by default. I cannot see any reference to or explanation of this in the article. Brianjd 03:21, 2005 Mar 6 (UTC)

I have added cross site scripting, however this is only relevant to a single element, script. Brianjd | Why restrict HTML? | 06:42, 2005 Mar 13 (UTC) [strike Brianjd | Why restrict HTML? | 07:09, 2005 Mar 20 (UTC)]

Actually, it is relevant to form and possibly other elements, but the name and the fact that both the examples were scripts gave the impression that it was only relevant to scripts. Brianjd | Why restrict HTML? | 07:09, 2005 Mar 20 (UTC)

The only justification I have seen for restricting HTML is cross site scripting (XSS), but restricting HTML is not going to help there because scams can be pulled off with or without XSS, e.g. on Donations for victims of the 2004 Indian Ocean earthquake. Brianjd | Why restrict HTML? | 13:45, 2005 Mar 28 (UTC)

The biggest reason I see is that wikitext is not HTML; it is a plain text markup that happens to have some things in common with HTML. There are good reasons it's a plain text markup--for one thing, you may want to render it as something other than a web page; for another, people shouldn't have to know HTML (or very much) to maintain it. Demi T/C 02:08, 2005 Mar 31 (UTC)
Some HTML can be used for malicious intent. So as a precaution the things we don't need are disallowed. Any particular tag you want to use? Mgm|(talk) 15:00, Apr 2, 2005 (UTC)

Disabling the "things we don't need" carries a cost, so we would rather have a specific reason for implementing restrictions. Brianjd | Why restrict HTML? | 02:07, 2005 Apr 7 (UTC)

HTML is restricted to prevent cookie stealing and web bugs which would violate a user's privacy, and to prevent malicious scripts which would be disruptive or make a page hard to revert. My favourite example of the latter is the minimalist JavaScript one liner while(1) alert("Hello"); -- Tim Starling 03:21, Apr 4, 2005 (UTC)

"cookie stealing" - keep in mind that any program can read cookies in Internet Explorer in Windows Brianjd | Why restrict HTML? | 02:06, 2005 Apr 7 (UTC)

"web bugs" - huh? Brianjd | Why restrict HTML? | 02:06, 2005 Apr 7 (UTC)

Having read the article web bug, I am unable to see how web bugs can pose a problem on web pages (they can on e-mail). Brianjd | Why restrict HTML? | 02:12, 2005 Apr 7 (UTC)

"malicious scripts" - why not just disable scripts? Brianjd | Why restrict HTML? | 02:06, 2005 Apr 7 (UTC)

there are several good reasons
  • Whitelists are safer than blacklists if the client software adds new features, accepts alternate spellings for existing features etc.
  • The html needs to be understood by the wiki engine so it doen't do stupid things in the output html (like putting a paragraph marker between cells in table markup).
  • It's easier to make sure the output is well formed html if you control whats acceptable in the input
  • It may be desirable to render to something other than html in future.
I think that list covers the main reasons for only allowing html tags that are approved by the mediawiki developers Plugwash 12:51, 10 Apr 2005 (UTC)

Can't we just allow anything that's valid in a particular version of XHTML? This will be a fixed standard, addressing the first 3 points. This will have exactly 1 way of interpreting it, so if we want to render in something other than HTML, we can write software (a bot?) to convert it.
Also, I note that this last point is already a problem, as it becomes a problem as soon as you let any HTML in. Brianjd | Why restrict HTML? | 05:07, 2005 Apr 17 (UTC)

No, not really. Wiki markup will render to valid HTML no matter what. But what if someone writes invalid XHTML? Is the software supposed to fix it? I don't really understand why you feel we should mix the two markup languages. What's wrong with wiki markup? If we allow HTML, then that reduces the number of people who can edit an article, since not everyone knows HTML. Personally, if I see unnecessary HTML in an article, I remove it.
Also, have you considered what would happen if Wikipedia allowed the img or embed tags? --Sean κ. 05:39, 17 Apr 2005 (UTC)

Response to first paragraph: If someone doesn't know HTML, they can just leave the HTML intact, or experiment with the "preview" button.

Response to second paragraph: It would allow added functionality? Right now we have no way of using Flash, right? Brianjd | Why restrict HTML? | 06:28, 2005 Apr 17 (UTC)

Response to first paragraph: If someone's lack of knowledge about HTML prevents them from editing, I say we are hurting Wikipedia.
Response to second paragraph: Currently, there is no way to embed media outside of Wikipedia into an article. If there was, it would open us up to spamming (people sneaking porn into articles using the img tag, by changing the image it pointed to without changing the source), malware (recently a vulnerability was discovered in JPEG, which is why need to be able to scan images as they are uploaded), or DoS (embedding an exceptionally large media file into an article, effectively making the article unable to load in a browser). That is just a short list of bad things I can come up with in 30 seconds. --Sean κ. 17:31, 17 Apr 2005 (UTC)
Further, I don't see why this question needs to be addressed on the HTML page. If you have a concern about the wiki markup, you should take it to meta. I notice that you have started the discussion there.. please keep it there. --Sean κ. 17:46, 17 Apr 2005 (UTC)
Allowing flash would open up a huge can of worms. Flash seems to be mainly used to create annoying advertising banners and silly games not things that seem appropriate in an encyclopedia. Also most people are unlikely to have the tools to edit it in any way which goes somewhat against the community editing spirit. Also allowing use of images and media from other servers would break the idea of wikipedia as a largely self contained system (you can for example take the public dumps set them up on a non internet connected server and let people browse wikipedia offline). Plugwash 17:56, 17 Apr 2005 (UTC)

This discussion doesn't really belong here, but... the whole point of the software is to standardise and makes things easier for editors. Permitting unrestricted HTML goes against both these principles; as far as I know, a future goal is to eliminate the use of HTML altogether. ··gracefool | 13:53, 24 Jun 2005 (UTC)

I recall a bug in Gecko allowing ... I think it was something like <tr href="mailto:nick@domain.com"> to open instances of the default email client forever. ¦ Reisio 01:46, 2 February 2006 (UTC)

HTML in Wikipedia

How do I use HTML in WikiPedia when I edit/create entries? —Preceding unsigned comment added by Flarn2005 (talkcontribs) 2 April 2005

To add HTML in Wikipedia, do what you would do if you were editing a web page in Notepad. This sentence is bold; if you click "edit" you can see how it was done. Brianjd | Why restrict HTML? | 02:08, 2005 Apr 7 (UTC)

At whom is this comment aimed? Anyone who has ever editted a web page in Notepad will be well aware how to add it anywhere; HTML is HTML. The people for whom simplified markups such as that used by this wiki were invented are the 90% or more of Internet users who have never looked at the source of a page in their life. If for some reason they do, they see "all these funny pointy brackets" and go "where's all my actual text hiding"; this is also why WYSIWYG editting is so popular. This is exactly why "unrestricted" HTML is not the basic markup for MediaWiki pages; I am strongly of the opinion that MediaWiki does not "restrict HTML", but rather borrows selectively from HTML in defining its own, completely unrelated, syntax. Arguably, even MediaWiki's more advanced syntax can be a little daunting to the unitiated, but at least most of it is designed to be understood by humans, rather than computers.
Personally, I would not be at all sad to see <h1> (etc), <b>, <i> and various other tags removed from the wiki syntax, as it's confusing to have multiple ways to produce the same result, and especially the heading tags have all sorts of weird side-effects.
(on a historical note '' used to produce <em> rather than <i>, but this decision was later reversed from experience of how the markup was actually used) - IMSoP 00:41, 22 Apr 2005 (UTC)
I actually think we should go the other way, wiki markup is a mess where virtually anything could be misinterpreted as markup and where some combinations are probablly rendered impossible by the order the parser does things in. A clean tag based format would be much easier to extend without breaking anything. Plugwash 18:24, 16 December 2005 (UTC)

Unrelated to above discussion--I've been using HTML2Wiki for a lot of stuff I'm doing, and it works great. Although I have had few problems learning wiki formatting syntax, I can't expect all of my contributors to learn it, which puts a sersious kibosh on them wanting to contribute. I've tried using the wysiwyg plugins (FCK and TinyMCE) but have had limited success. It is a tough call. The great thing would be a utility to go the other way--i.e. Wiki2HTML--so that pulling content out and reusing in other media would be easier. Or straight XML, which seems to be the way things are going. Justinlaine

See meta:Alternative parsers (including a Wiki2XML which may yet become the standard parser), and please stop using this page for discussion not related to Wikipedia's article on HTML. [I'm going to resist getting drawn back in, on principle; fascinating though it is, it belongs elsewhere. - IMSoP 16:30, 25 January 2006 (UTC)

Removed security risks

yes some of the secuirty issues are ie specific but others (especially involving scripting) are pretty much by design of the scripting systems used in html. Furthermore even IE specific issues ARE of concern to anyone involved with putting user supplied content into html pages. Plugwash 12:36, 12 Apr 2005 (UTC)

This is ridiculous.. this is the second time the section on "Security risks" has been deleted. Look, HTML itself has no security holes. If we ignored the fact that ActiveX and JavaScript exist, which are not part of HTML, then all HTML does is describe the presentation of information (please see introduction). Calling HTML insecure is as ridiculous as saying "SMTP is insecure because it causes viruses to come through my Outlook program." --Sean κ. 18:48, 17 Apr 2005 (UTC)

To clarify: there is a security risk involved in embedding media within a web page. However, a browser can render HTML validly by replacing all media with a black rectangle, thus it is not a problem with HTML itself. This does not have any bearing on the discussion of including HTML in Wikipedia, above. --Sean κ. 19:01, 17 Apr 2005 (UTC)
nonetheless they are issues that affect all real world use of html. Should we ignore them completely? should we move them to a separate article and put it in see-also? or what? Plugwash 01:33, 18 Apr 2005 (UTC)
I'll look around. Most likely, they should be under "Security risks of web browsing" or "Security risks of using Internet Explorer/Firefox/whatever". --Sean κ. 01:54, 18 Apr 2005 (UTC)

Programming languages

Does anyone know why HTML is not included in the template that lists major programing languages?...

Since it's the language that the Web was originally built with it seems as major as most the others on that list. PHP is included on that list, but it probably would never have existed were it not for HTML.

--Blackcats 03:45, 4 Mar 2005 (UTC)

It is not included, because HTML is a markup language and not a programming language. --Andreas 26 Mar 2005 —Preceding unsigned comment added by 217.236.119.31 (talkcontribs)

From Programming language:

A programming language or computer language is a standardized communication technique for expressing instructions to a computer.

It looks to me like all markup languages are also programming languages. Brianjd | Why restrict HTML? | 06:37, 2005 Apr 17 (UTC)

No. The markup <em>Important point</em> means "Emphasis on Important point". How the computer interprets this is dependent on the programming. The word "instructions" in the line you cited has a very specific meaning, not the one you are thinking of. The phrases "My document has font size 12pt" and "Make sure to underline links" do not suggest the direct action a computer will take, even though one could think of them as "instructions". If you read the entire article at Programming language, you will see how they differ from Markup languages.--Sean κ. 18:56, 17 Apr 2005 (UTC)

HTML doesn't have any loops, conditionals, functions, etc. that are parts of programming languages: in particular, it has no decision mechanisms. It's not Turing-complete: a program in any of the languages you list above can (with effort) be rewritten into any other language, but not into HTML. (Perhaps the output can become HTML, but the code itself is not.) It's kinda like how people say that music is a language, but they wouldn't try to write a five-paragraph essay in music. HTML is a computer language but not a programming language. --Geoffrey 23:44, 17 December 2005 (UTC)

This is also being discussed on [Talk:Programming_language#HTML_and_Turing-completeness|the talk page for Programming language]. The argument I made was:

I always read that HTML with scripting (client or server side) was called Dynamic HTML (DHTML). The script tag hasn't always existed.

Pure HTML has no mechanism for instruction. Only with add-ons does it. HTML isn't a programming language, DHTML isn't either because it isn't a language. It describes the interaction between a Turring complete language and the page descriptor (markup) language HTML. Jaxad0127 19:13, 6 June 2006 (UTC)

External links

HTML tutorial/Design information

Does anyone else think that the following links in the page are rather poor?

I vote vor deletion of this links, but am not entirely sure because they have survived quite a few edits. -- Patrice Neff 06:18, 11 Feb 2004 (UTC)

Vote for. I would only link a tutorial for valid HTML here, not tag soup. Jor 14:23, 11 Feb 2004 (UTC)
Another vote for deletion, w3schools link is much better -Ayman
Vote for deletion. I agree that w3schools is better. These are good sites, though. On what basis do you make such vague criticisms? Brianjd | Why restrict HTML? | 08:14, 2005 Apr 17 (UTC)

NetDoc

I vote to put this link on: http://www.visiomode.com/docs/ Those CHMs are absolutely handy when there's a need to check something quickly from the spec. -Ilkka ... Comments on this? —Preceding unsigned comment added by 62.78.199.72 (talkcontribs) 12 May 2004

How is this relevant? Brianjd | Why restrict HTML? | 08:14, 2005 Apr 17 (UTC)

Acronym instead of full name.

Why is the name of this article HTML, and not HyperText Markup Langauage (I don't know certainly about capitalization and such, but you get the idea)? See CSS for example. Why does this differ? —Preceding unsigned comment added by 80.230.144.75 (talkcontribs) 22 April 2005

Well, one of the key reasons for CSS being spelt out is that CSS is a disambiguation page - that is, there are multiple meanings of "CSS" which are equally deserving of Wikipedia articles; this is not, of course, true of "HTML". This then leaves us with the overarching convention stated at the top of Wikipedia:Naming conventions:
Generally, article naming should give priority to what the majority of English speakers would most easily recognize, with a reasonable minimum of ambiguity, while at the same time making linking to those articles easy and second nature.
I think "HTML" fits these general principles - it's far more recognisable than the expansion, unambiguous, and most likely to be what people will type as the link. A separate page for Wikipedia:Naming conventions (acronyms) also takes this view - the acronym should be spelt out "unless the term you are naming is almost exclusively known only by its acronyms and is widely known and used in that form"
[BTW, both Hypertext Markup Language and HyperText Markup Language redirect here, with 3 and 1 links to them, respectively]
I hope this seems a reasonable justification. - IMSoP 14:29, 22 Apr 2005 (UTC)
It is reasonnable but is contrary to what the HTTP page does as it uses the expanded version. Not really a showstoppper but lack of coherency. 194.221.74.7

repetitive

"...HyperText Markup Language (HTML) is a hypertext markup language..."

Sounds a little odd if you ask me.

what about something like "...HTML stands for HyperText markup language..."

--Atomican 16:46, 6 August 2005 (UTC)

http://en.wikipedia.org/wiki/Template:Sofixit
http://en.wikipedia.org/wiki/Wikipedia:Be_bold_in_updating_pages ¦ Reisio 19:27, 2005 August 6 (UTC)

Reverted edit by 24.154.6.44

I reverted the edit by 24.154.6.44 (talk · contribs), since it referred to Notepad (completely unrelated), talked of things in tag level (should talk of enclosing things in elements, not surrounding them with tags), did not mention DTD, and <HTML> by itself was a bit of trivia. However, that got me thinking if there should be a section about the typical structure of a HTML document (<html>, <head>, <body> and all)? Aapo Laitinen 19:42, 20 October 2005 (UTC)

Indexed HTML

Where would I go from here to find out about indexed HTML within wikipedia? HTMLDOC[1] says it makes such a file, but documentation is scarce. Hackwrench 16:36, 16 December 2005 (UTC)

You most likely mean a HTML file with a Table of Contents, since it seems that's what HTMLDOC is used for--generating PDF / HTML files with indexed headings / pages. -- Parasti 23:57, 1 January 2006 (UTC)

Intro picture

I don't disagree with including a picture in the lead section of the article, but the current picture not only shows JavaScript but focuses on it. Aapo Laitinen 21:20, 15 January 2006 (UTC)

  • I've changed it to one without JavaScript. --Tysto 23:37, 15 January 2006 (UTC)
    • I've changed to a more semantic one. --minghong 02:01, 16 January 2006 (UTC)
      • That doesn't look more semantic to me. "<span id="time">&nbsp;</span>"? Just because you saw a table tag in there doesn't mean it's not semantic! :-) In fairness, neither example is, but I think the first one looked nicer. Rufous 13:37, 16 January 2006 (UTC)
        • I think they're all goofy as hell. :) ¦ Reisio 16:07, 17 January 2006 (UTC)

Copying from WebThing

Somebody at 195.195.87.138 just copied and pasted this entire page into the article. I can't see any appropriate licensing terms on that page, and the user hasn't identified themselves as being from WebThing, so I'm reverting it as it's probably copyright infringement. --Bogtha 22:10, 25 April 2006 (UTC)

Abuse

I just excised this addition:

Although many education facilities in the UK are still teaching 1990's web design and will not accept this fact. One example is Billy Taggart from Glasgow, the man has been teaching the same stale HTML from 3 years ago. W3C recommends XHTML, remember that Billy, not Dreamweaver.

Someone seriously abusing Wikipedia there.--SergeiRichard 00:37, 7 May 2006 (UTC)