Talk:Naive Bayes classifier/Archives/2014

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

"Naïve Bayesian classification" moved to "Naive Bayes classifier"

Hello. I have reverted "naïve" to "naive" in the article text, as "naive" is the usual English spelling, and occurs more often than "naïve" in texts (papers, books, web pages, etc). I have also moved naïve Bayesian classification to naive Bayes classifier. For various combinations of terms I find the following:

  • "naive Bayes classifier" yields approx 11,000 Google hits
  • "naive Bayesian classifier" yields approx 5000 Google hits
  • "naïve Bayes classifier" yields approx 1000 Google hits
  • "naïve Bayesian classifier" yields approx 500 Google hits
  • "naive bayesian classification" -wikipedia -encyclopedia yields approx 500 Google hits
  • "naïve bayesian classification" -wikipedia -encyclopedia yields approx 150 Google hits

As this classifier is very common in computer-related texts, it is reasonable to suppose Google is a reliable indication of the currency of different variations of the name. Regards & happy editing, Wile E. Heresiarch 04:51, 27 Dec 2004 (UTC)

But "naïve" is proper English (with the umlaut), so wouldn't that "overrule" the "most common" phrase? WhisperToMe 05:28, 27 Dec 2004 (UTC)

For the benefit of other readers, I'll copy here some comments I put on user talk:WhisperToMe: (1) Re: standard English. I can't find any dictionaries or other sources which state that the correct spelling is "naïve". Every source I have found shows "naive" as the primary spelling, and shows "naïve" as an acceptable variation of "naive". It is clear that both spellings are acceptable. Naïve/naive isn't mentioned at Wikipedia:Manual of Style or American and British English differences. If you have some other sources I'd like to hear about it. (2) Agreed that the Google test only shows what's more common. However, since both spellings are acceptable and "naive" is more common, and much more common in a mathematics context, a crusade to change "naive" to "naïve" seems pointless at best. Wile E. Heresiarch 06:57, 27 Dec 2004 (UTC)
Our university professor taught us that naïve comes from the French language and insisted that it's the only correct spelling even in English. I had to fix two of my LaTeX handins just because of the diaeresis, albeit personally I prefer the naive spelling and after graduation I always wrote naive in my publications unless required otherwise (it did happen to me to be requested to fix a LaTeX paper just because the editor wanted the naïve spelling!). Apparently we should include both spellings in the article. Sofia Koutsouveli (talk) 22:30, 21 March 2014 (UTC)
Searching for "naïve Bayesian classifier" and "naive Bayesian classifier" come up with exactly the same pages (6660 pages each, in the same order). Wouldn't it be best to use the spelling "naïve", since it's easier to read? Otherwise many people will be reading it as "knave Bayes classifier" and getting confused... It's not as bad as trying to use "resume" as a noun, but I think it's better to use an ï, since it becomes easier for some people to read. Does anyone have trouble reading "naïve" but no trouble reading "naive"..? (If so, we can be nice, and make it even easier for them to read, by writing "naıve".) Κσυπ Cyp   15:55, 27 Dec 2004 (UTC)
Searching for "naïve Bayesian classifier" and "naive Bayesian classifier" come up with exactly the same pages (6660 pages each, in the same order). – Could the reason be that Google naively treats "naïve" and "naive" as interchangeable? On what basis do you assert that "naive" is harder to read than "naïve"? How can "naive" be confused with "knave" by someone who's likely to understand the article? Sure, if a child or someone learning English is completely unfamiliar with the word "naive" they may think that it's pronounced like "knave", but then the real problem is that they don't know what "naive" means in the first place. If they decide to look it up in a dictionary, they would also learn about the correct pronunciation. I would say in the case where two forms like "naive" and "naïve" exist and are equally acceptable, it's better to use the form that is easier to type. If we only had "naïve" everywhere with no redirects, someone might try to search for "naive Bayes" (because that's easy to type for lots of people, whereas "naïve" is not, even on many types of European keyboards); they would be unable to find the article, and either give up or start a duplicate article. In the context of this article, I would say that "naive Bayes" is far more frequent than "naïve Bayes", but check for yourself: do a search for "naïve Bayes" on http://scholar.google.com/ and see how many occurrences of "naïve" you actually find. --MarkSweep 01:10, 28 Dec 2004 (UTC)
Ummm, yes, it could be because Google naïvely treats "ï" and "i" as interchangable. (As searching for either finds the same pages, disregarding whether they use the easy to read or easy to type version.) The scholar.google.com does the same thing, except it doesn't display the diaeresis until I follow the links. (Clicked on a random link that it found, and it used the "ï", not the "i".) I assert that "naive" is harder to read than "naïve", becuase "naive" looks rather strange and distracting to me. It's obvious what it meant, after spending an extra second reading it, but why make people spend an extra second reading it to understand it? I would say that in the case where two forms like "naïve" and "naive" exist and are equally acceptable, it's better to use the form that is easier to read. We did not only have "naïve" everywhere with no redirects, and problems arising from not having any redirects will remain purely hypothetical. Κσυπ Cyp   04:59, 28 Dec 2004 (UTC)
Let's not make decisions based on a single random link. Furthermore, while I don't have any evidence that "naive" won't cause any additional confusion (except for people who don't know the concept in the first place), you don't seem to have any evidence that "naïve" is easier to read either. I would say the burden of proof is on you here: can you demonstrate empirically that "naive" actually causes confusion? Significant confusion? Utterly hopeless cannot-make-heads-or-tails-of-it confusion? The other issue is with instances of "naïve" that do not occur in an article title. Does the new Mediawiki search facility treat "naive" and "naïve" as equivalent? (I don't know.) I suspect (without proof) that "naive" is a more frequent search query than "naïve", since it's easier to type for just about anyone. Unless both terms are treated as equal, searching for "naive" will miss pages that only have "naïve" in them (on second thought, this is turning into an argument in favor of inconsistent spelling, using all variants of a relevant word in an article).
Empirically, I find "naïve" easier to read than "naive". I do not, and have not claimed, that it is significant confusion, just that "naïve" is easier for me to read. If some people find both equally easy to read, and some find "naïve" easier to read, then it seems that "naïve" is easier to read on average. (As far as I can tell, noone has claimed that they find "naive" easier to read than "naïve".) The Mediawiki search seems to be disabled at the moment, although I would guess that it wouldn't treat them the same. I also suspect (also without proof) that "naive" is a more frequent search query than "naïve", since it's easier for many/most people to type. Since (hopefully) noone is going round deleting redirects between "naïve" and "naive", searching should find both spellings. (I certainly think that searching for "naive" should find the articles, as well as searching for "naïve"...) Κσυπ Cyp   17:40, 28 Dec 2004 (UTC)
I'm sorry but "empirically, I find" just doesn't make sense: you're not stating an empirical observation, you're only stating your own opinion, which you are certainly entitled to. But since you have a stake in the outcome, you cannot count your own preferences in an empirical study. I could claim that I find "naive" easier to read (since it has fewer dots and looks more normal to me), but I would have to discount that as my own biased opinion, which isn't empirical evidence. Regarding the issue of full text search, I was referring to articles that do not have "naive" in the title and which can only be found by a full text search. However, my argument is not particularly good: by the same token, someone might search for "colour" and not find a relevant article that mentions "color" in the body text but not in the title. So all we have now in terms of arguments is (1) your opinion that "naïve" is easier to read; (2) the fact that in a non-random sample of 16 relevant publications (see below) "naïve" occurs in 2, but "naive" in 14; and (3) opinions from several editors that "naive Bayes" is more common. For all I know the conjunction of these three propositions is not a contradiction: it could be the case that "naïve" is in fact easier to read (though I will remain skeptical) and that "naive" is more common (for which I believe there is sufficient empirical evidence). In that case, we still need to make a decision which form we should pick, and there is precedent for choosing the more common form. --MarkSweep 19:52, 28 Dec 2004 (UTC)
After looking up "empirically" in dictionary.com, I'm not sure that I was using the word correctly. I meant, subjectively/personally, I find "naïve" a bit easier to read than "naive". I hadn't thought of full-text searches, before. If you do actually find "naive" easier for you to read, not just easier to write, then I'm fine with it being left as "naive". (I got the impression that noone here actually found "naive" easier to read, just thought it should be used because of being easier to type or more common.) (wɛn wɪl piːpl ɑːfɪʃəliː swɪtʃ tuː juzɪŋ ʌ fənɛtɪk əlfəbɛt fɔː ɪŋgɫɪʃ..?) Κσυπ Cyp   02:00, 29 Dec 2004 (UTC)
Some data points: The first two references cited in the present article both use "naive", not "naïve" (I was unable to check the third reference). Russell and Norvig use "naive", not "naïve". Among the first ten results returned by scholar.google.com, 8 use "naive" and 2 use "naïve". Added later: Mitchell's Machine Learning textbook (ISBN 0070428077), Data Mining by Han and Kamber (ISBN 1558604898), and Data Mining by Witten and Frank (ISBN 1558605525) all use "naive" exclusively. Score: "naive" 14, "naïve" 2. --MarkSweep 19:52, 28 Dec 2004 (UTC)
Finally, the insidious slippery slope argument: would you be in favor of writing "coördinate" and "reëlect" as well? How about "reärmed", since that could easily be confused with "rear med"? --MarkSweep 07:06, 28 Dec 2004 (UTC)
I wouldn't support or oppose a diaeresis on those words. The "ö" in "coördinate" seems slightly more appropriate than the "ä" in "reärmed", although I'm not sure why. Possibly because reading the "ä" as an umlaut would make the pronunciation completely wrong. (I think that pronouncing "naïve" without a diaeresis would sound much worse than pronouncing the other three words without a diaeresis.) Κσυπ Cyp   17:40, 28 Dec 2004 (UTC)
User:Cyp, I can't tell what you're taking about. Searching for "naïve Bayesian classifier" and "naive Bayesian classifier" come up with exactly the same pages (6660 pages each, in the same order). Googling for the exact phrase (with quote marks, and with -wikipedia -encyclopedia) I get 5000 for "naive" [1] and 500 for "naïve" [2] as reported above. Without quote marks (and with -wikipedia -encyclopedia) I get about 21,000 for "naive" [3] and 6000 for "naïve" [4]. So on what basis are you trying to claim "naïve Bayesian classifier" and "naive Bayesian classifier" are equally common? -- In any event, if you want to claim "naïve" is easier for some people to read you're going to have to come up with some evidence for that; "naive" looks rather strange and distracting to me simply doesn't count. -- We did not only have "naïve" everywhere with no redirects, and problems arising from not having any redirects will remain purely hypothetical. -- I'm sorry, I simply don't understand what you're getting at here. Wile E. Heresiarch 05:27, 28 Dec 2004 (UTC)
When I search with google, it treats "ï" and "i" as completely identical. I have no idea why it behaves differently, when you search. Last time I checked, I was a person, so I have already come up with evidence that some (at least one) people find "naïve" easier to read than "naive". Κσυπ Cyp   17:40, 28 Dec 2004 (UTC)

Naive or naïve

At university I was only taught the naïve spelling and I was told that naive is wrong, in fact our professor refused to accept our answers if we failed to add the diaeresis over the i, it happened twice to me and I had to re-write my LaTeX handins. I personally prefer the naive spelling and I hate the diaeresis as I've to switch to the French keyboard to add it, and actually after I finished from university I always wrote naive in my publications unless specifically required otherwise. Nevertheless, I feel we should surely note in the article, preferably in the lead section, that the term is written both with naive and naïve. Is there anyone feeling bad about including both spellings in the lead and what do you think? Sofia Koutsouveli (talk) 22:23, 21 March 2014 (UTC)

Requested move 17 August 2014

The following discussion is an archived discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review. No further edits should be made to this section.

The result of the move request was: not moved. Jenks24 (talk) 14:42, 25 August 2014 (UTC)



Naive Bayes classifierNaive Bayes – Current page title is a bit of a tautology, since naive Bayes models are always classifiers. – QVVERTYVS (hm?) 14:32, 17 August 2014 (UTC)

This is a contested technical request (permalink). Anthony Appleyard (talk) 16:00, 17 August 2014 (UTC)
Anthony Appleyard, can you explain why this would be controversial? (Note that Naive Bayes redirects here already.) QVVERTYVS (hm?) 16:14, 17 August 2014 (UTC)
  • "Naive Bayes" without "classifier" is unclear, it could mean a story about a naive man called Bayes, or various things. Anthony Appleyard (talk) 16:28, 17 August 2014 (UTC)
    • That would be a WP:DICDEF unless you have a source to establish such a man's notability ;) QVVERTYVS (hm?) 16:34, 17 August 2014 (UTC)
  • Oppose per WP:NOUN. Dicklyon (talk) 05:04, 18 August 2014 (UTC)
  • Oppose: A great example of why excessive conciseness is not a goal here. It's almost always better to use noun phrases than shorter adjectival ones here, unless there's something adjectival about the topic. Which is rare, very rare.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  13:53, 24 August 2014 (UTC) PS: Cf. Fast Fourier transform and similar cases; we don't truncate them to things like Fast Fourier except as redirects, despite the propensity for some specialist sources to do so.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  10:56, 25 August 2014 (UTC)
    Dicklyon, SMcCandlish, "naive Bayes" is a noun:
    • "Naive Bayes is often used as a baseline in text classification because it is fast and easy to implement" ([5], abstract)
    • "we investigate the optimality of naive Bayes under the Gaussian distribution. We present and prove a sufficient condition for the optimality of naive Bayes" ([6], abstract)
    • "We also demonstrate that naive Bayes works well" ([7], abstract)
    • "naive Bayes often performs classification very well." ([8], introduction)
    ... and I could go on. QVVERTYVS (hm?) 08:46, 25 August 2014 (UTC)
That's shorthand "nouning" of an adjectival phrase. I don't see how it's different from advertising and PR people using "creative" as a noun (e.g. "our firm specializes in social-media-based creative and messaging". This sort of "nouning" is a specialist, WP:JARGON usage, and will confuse people outside the specialty in question, because it's not a standard English usage pattern. A more exact comparison would be using "fast Fourier" instead of "fast Fourier transform" in a bunch of phrases like that (e.g. "A fast Fourier computes the DFT and produces exactly the same result as evaluating the DFT definition directly"). You can definitely find it used that way in specialist literature, but almost never in general-audience sources (like WP itself) because it "does not compute" for anyone but specialists in the fields in which that shorthand truncation is used. The "classifier" in the name of this article is only redundant to people who use naive Bayes classifiers; it's not redundant to the average reader, which is who WP is written for.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  10:56, 25 August 2014 (UTC)
Ok, point taken. Textbooks actually tend to spell it out on first use, AFAICT. QVVERTYVS (hm?) 11:16, 25 August 2014 (UTC)

The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page or in a move review. No further edits should be made to this section.