Talk:Cumulative density function

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
WikiProject iconDisambiguation
WikiProject iconThis disambiguation page is within the scope of WikiProject Disambiguation, an attempt to structure and organize all disambiguation pages on Wikipedia. If you wish to help, you can edit the page attached to this talk page, or visit the project page, where you can join the project or contribute to the discussion.

Adding the probability mass function as another concept mixed up in this confusion[edit]

Michael Hardy, I'd be grateful for your input (and anyone else's) on the addition I've made. This followed our discussion at Wikipedia:Articles for deletion/Cumulative density function.

I'm grateful you corrected my inaccurate text. The result is much better. I still feel, however, that the reader who accidentally lands up at this page is going to need more detailed help to understand what they're getting wrong. I know Wikipedia isn't a text-book, so it's high-risk trying to add an explanation, but equally there's no point in having articles that are only meaningful to people who already know the material.

I've added the discrete distribution because a lot of those who use statistics practically, rather than care about the theory of statistics, will have a very un-rigorous approach to the difference between the discrete and continuous cases. When we use the normal distribution to describe a measured variable, we know in our heads that we've measured it with fixed precision, and therefore our values are discrete, even though the distribution is continuous, so we blur the boundary between the mass function and the density function. That's why I've mentioned them as equivalents, even though fundamentally they can also be viewed as very different, because the y-value in a mass function plot is already probability, while the y-value in a density function plot is a probability-per-unit-x, and must be integrated. In effect, the mass-function can be seen as a density function that's been pre-integrated in slices corresponding to the discrete values we care to measure.

The main thing I wanted to do was (1) make sure the reader knows what the words "cumulative" and "density" actually mean, as this is the root of the confusion this disambig is supposed to address, and (2) make sure they find their way to the correct article, which might actually be the probability mass function.

I hope this makes sense? Elemimele (talk) 17:33, 10 June 2023 (UTC)[reply]

Nature of discrete probability distributions[edit]

I have rephrased the part that referred to "discrete values" because there is no such thing as a discrete value. Any real number at all can be a value of a discrete random variable. A discrete probability distribution assigns probabilities to individual points, and assigns all of the probability to individual points, i.e. the sum of the probabilities assigned to those points is 1. Michael Hardy (talk) 20:14, 13 June 2023 (UTC)[reply]

@Michael Hardy:, I'm fine about the discrete bit, but I'm less happy with removing the description of sigmoidal as what we've ended up with is a text that may be completely correct, but is also likely to be completely incomprehensible to the sort of readers who'll end up at this page (biologists, for example, are likely to be using these distributions in their statistical tests, they'll suddenly have to choose whether they want the density function or the cumulative function when they're using Excel, but they may have no mathematical training over age 16; many chose biology because they're rubbish at maths, and they have no idea what is a strictly increasing function). We also have the following sentence in our article on Sigmoid function:

"Sigmoid curves are also common in statistics as cumulative distribution functions (which go from 0 to 1), such as the integrals of the logistic density, the normal density, and Student's t probability density functions."

I'm not fully au fait with the exact requirements to be a sigmoid curve, but wanted here, to describe visually, in a fairly imprecise way, what shape the cumulative distribution function of the normal distribution has, because this is a distribution which which most readers will be familiar. Is there a better technical term for "sigmoidal" that does fit this cumulative distribution, and yet is still a word that a non-mathematical biologist will know, or is there a distribution that they'll know that has a genuinely sigmoidal cumulative distribution? Elemimele (talk) 16:29, 14 June 2023 (UTC)[reply]

Disambiguation pages and common errors[edit]

I have a couple of concerns regarding the current resolution of this issue after the AfD.

First, disambiguation pages are just navigational tools. Their purpose is to help readers find the article they're looking for, not to correct misconceptions they might have. They usually don't have references, because they don't need them; they have no substantive content to support.

This page has no references, but does make substantive claims, namely that the title phrase is self-contradictory and results from a confusion.

It's possible that that problem could be fixed, if there are references directly on-point, which there might be. But at that point the page is no longer a disambig page, but an article, and in particular the disambig template should be removed. Are we really going to have an article about a single phrase found to be problematic? I would think you would need a very notable controversy about the phrase for that to stand. For comparison, people have written reams about the phrase could care less, but that is still fortunately a redlink.

Finally, in my opinion it's just not generally part of the remit of an encyclopedia to call out things that people might be mistaken about, except possibly as an inline aside during exposition of something else. Search for "misconceptions" in the archives of talk:mathematics to find what I've expressed on this previously. --Trovatore (talk) 18:07, 14 June 2023 (UTC)[reply]

@Trovatore: Maybe it should be called something other than a disambiguation page, and maybe it should have references. But there used to be a problem of people linking to "cumulative density function" in Wikipedia articles and otherwise using the term in articles, when they meant "cumulative distribution function", so Wikipedia has a problem that this page helps solve. Michael Hardy (talk) 18:29, 14 June 2023 (UTC)[reply]
Hmm, I'll allow that that's a problem, but I'm not convinced this is a good solution.
I had thought you were making a righting great wrongs argument; that is, that you wanted to help reduce uses of this phrase "out in the world". That I think is clearly not an acceptable justification for the page — that's just not our function.
The argument you give above, that it's meant to solve an in-wiki problem, is a bit different, and just conceivably you could try a WP:IAR-type argument for it. But how big a problem is it really? Could you just search for it from time to time, say once a month, and address the three or four hits you might get? --Trovatore (talk) 18:36, 14 June 2023 (UTC)[reply]
I share the concern about having an article about a single phrase found to be problematic. I might go a little further and say "claimed to be problematic", since without one or more sources actually making the point, this is just a gripe or a pet peeve. Plenty of mathematical terminology and notation is confusing. Heck, I've been complaining about the superscripts in and since high school. And sub- versus supermartingales, anyone? But we don't create articles for every pet peeve of mine or every grievance that gets aired in forum threads about "your least favorite math notation". As I said at the AfD, without sources this is all disallowed synthesis, and even with sources it doesn't stand alone as an encyclopedic topic. If supported by documentation, it could be slotted in to the cumulative distribution function page, maybe at the end of the "Definition" section.
For solving an in-wiki problem, the right place would be MOS:MATH. XOR'easter (talk) 18:45, 14 June 2023 (UTC)[reply]
Let's look at the options. We can convert this back to a very simple disambig page that tells the reader they've been silly but doesn't tell them why (how is that conceivably helpful to anyone?), or we can delete it, and one of the keep-voters will quite reasonably complain that we've gone against a very clear consensus at AfD. Or we can leave it with explanation that is fully backed by sources in the articles to which the links point, if anyone takes the minimal trouble to click on them.; and this which is exactly what a disambig page should do. It would be sort of nice if the needs of our readers featured a little more highly in the evaluation of our maths content. Elemimele (talk) 20:51, 14 June 2023 (UTC)[reply]
Actually I don't think the consensus was super-clear; the AfD was rather sparsely attended. But in any case you've left out a couple options.
  • We could redirect to cumulative distribution function, almost certainly the overwhelming intent of the search. That is not a deletion.
  • We could make it a dab page that doesn't say the reader has done anything silly, but simply lists possible targets without comment.
As for the "needs of the readers", you have to be a little careful what you mean by that. You might think that readers "need" to be disabused of a misconception, but this is not our role, not in that sort of active sense. --Trovatore (talk) 21:16, 14 June 2023 (UTC)[reply]
@Trovatore:, we're going round in circles. My very first reply at the AfD was:

"Yes, I'd suggest converting it to a simple redirect to Cumulative distribution function because that's almost certainly what the reader is looking for, and if it isn't, they'll find enough information there to sort themselves out. "

But others, notably Michael Hardy disagreed vehemently, and felt that readers are genuinely confused:

"@Elemimele: The phrase "almost certainly" is certainly wrong. Perhaps you don't realize how confused students can be sometimes. "

Having seen biologists doing maths, and in fact having a foot in that category myself, I felt that Michael Hardy was probably making a good point.
In terms of the needs of our readers, I'm not here to right wrongs or make a statement, but Wikipedia is here to provide information to those looking for it. The extent to which we need to consider their needs is (1) to provide articles that contain information likely to be useful to as broad a spectrum of readers as possible, and (2) to help our readers find the articles that they are (probably) looking for. I no longer like the redirect because Michael Harvey's got a valid point that some readers might be looking for the density function. I'd accept just the links to the two (or three) relevant articles, but since anyone arriving at this page by definition is confused about the difference between density and cumulative plots, it follows that if the function of a dab page is to help them find the right thing, we should help them. Elemimele (talk) 06:13, 15 June 2023 (UTC)[reply]
Incidentally, there's an additional problem in this case. Here we have an article that is probably most valuable to users of statistics rather than statisticians, and many users come from fields that have a very strong technical meaning for words like "sample". Their technical meaning may be quite different to the statistician's meaning, used in the lead of the article to which we're redirecting/disambiguating. Therefore merely hovering over the lead might not help the reader to find which article they need. That's why I think we're justified in providing more than just the links.
Small print: the sort of confusion I'm talking about with "sample" is that in Probability density function we use the word "sample" to mean a value taken from sample space, which means all values taken by the variable, and we define the function by saying "can be interpreted as providing a relative likelihood that the value of the random variable would be equal to that sample". This will completely confuse a biologist. If they're interested in the sugar level of oranges, they will have measured sugar in various replicate extracts of orange, which they will call "samples". They imagine that these samples come from an infinitely large collection of extracts that could have been made, and that the probability function they're using describes that collection (they hypothesise!), so to them there is no meaning in the idea of the random variable being equal to a sample. They'll be asking "what do you mean, the random variable is equal to this? Are you trying to say the mean of the population is equal to this?", because to them, the "variable", sugar-content, is something that varies between all the oranges in the world, and has no particular value. It's a population, and a population cannot have a single value. So to them they'll be saying "if the infinite population of samples that I could have made truly conforms to the model of this distribution, then the probability density function, appropriately integrated, will tell me the likelihood that I would find a set of samples with the values that I've measured". Their "samples" are not values derived from the function, they are values derived from oranges, which may behave differently to the expected function, and the chance of them behaving differently is very high in the biologist's mind, because that is what they are testing. Elemimele (talk) 15:27, 15 June 2023 (UTC)[reply]
Elemiele, students may well be confused. That's not our bloody job!!!1!1!! That's not what we do. This is the core of the reason I object to this sort of attempted hand-holding.
Even if I didn't, there would remain the problem that (1) the text is completely unreferenced, and (2) referencing it would turn the page into an article that shouldn't exist because it's too trivial for an article. No, you cannot defer to references in the linked articles; even if that were a legitimate method, which it isn't, almost certainly none of those references say directly that the phrase "cumulative density function" is an error — and you do very specifically need that, otherwise as XOR'easter points out, it's original synthesis. --Trovatore (talk) 18:33, 15 June 2023 (UTC)[reply]
Trovatore, individual reply: this article survived an AfD, so for the moment there is no question whatsoever of its being deleted. You can take it to deletion review if you honestly believe that the closure was wrong, but disagreeing with the outcome isn't a valid reason for overturning an AfD result.
More generally, I think Trovatore has put their finger on precisely why the maths side of Wikipedia is so utterly pointless, and this disambig is right in the middle of it: A traditional encyclopaedia is an over-view book opening a window for the normal person in the street into a wide range of subjects, sharing information and enlarging knowledge. But our maths articles aren't encyclopaedic in that sense; they're not there to help an ordinary man learn about combine harvesters, the economy of Ghana, musicians of the 17th C or the excitement of all the places in technology where Fourier transforms are useful, and how the FFT algorithm revolutionised technology, or anything like that. They are mostly written in a way that only makes sense to a reader who already knows what the article is trying to say. As a result, no one learns anything from the maths articles; 99% of readers can't, and 1% already know. Basically, most of our maths content is a vanity project being used by its editors to express how clever they are with notation; it's not an encyclopaedia, which is why I prefer not to waste much time on it. The really sad thing is that the articles on probability distributions could have been good ones, of wide interest. But they're not. Elemimele (talk) 06:13, 16 June 2023 (UTC)[reply]
Just a note, Elemimele: Redirection is not deletion. The only thing that is deletion is making the search term a redlink. So surviving an AfD does not in fact bar turning the term into a redirect. --Trovatore (talk) 06:35, 16 June 2023 (UTC)[reply]
@Trovatore, true, but I think if you convert it into a redirect in the face of an AfD that specifically discussed and rejected a redirect, then some of the keep-voters are likely to object. My original !vote was indeed redirect. Sorry about my rant above; I feel strongly our maths articles could be more generally understandable, and get over-passionate about it. Elemimele (talk) 10:09, 16 June 2023 (UTC)[reply]
OK, here's the thing. The unreferenced content cannot stand; it must be removed. And disambig pages should not contain expository content. So the options are: Redirect, disambig without comment, or find references (that specifically say the phrase is an error) to make a trivial article which really shouldn't be there. --Trovatore (talk) 17:13, 16 June 2023 (UTC)[reply]
I've raised this with the admin who closed it, because frankly I agree that we're now in a very silly situation that the outcome of the AfD simply cannot be implemented without flying in the face of WP's sourcing rules. We can't have a disambig that doesn't say it's an error, because otherwise we claim the existence of a term that doesn't exist. We can't have a disambig that says its an error because no one's got a reliable source that it's an error. We can't have a redirect because it's a portmanteau of two different titles, so which do you point at? And anyway, several experienced editors rejected that option explicitly at AfD. We can't delete it because the AfD result was keep. And so there's no possible outcome. You will have been alerted to my post at Star Mississippi's talk-page, I hope I've summarised your viewpoint accurately and apologise if I haven't. In many ways it would help if the keep !voters would reconsider, but I don't want to try to canvas them to change their point of view, that's not really encouraged behaviour. Elemimele (talk) 20:10, 16 June 2023 (UTC)[reply]

It will take me a while to get up to date with this discussion. Let us note that:

  • Wikipedia once had a whole bunch of links to cumulative density function. I cleaned them up.
  • The phrase occurs in well over fifty-thousand scholarly articles by otherwise respectable writers.

I don't insist that this must be labels a disambiguation page, but it should be here in some form. Michael Hardy (talk) 21:08, 16 June 2023 (UTC)[reply]

Thank you @Elemimele: for the note on my Talk. Responding here to keep the discussion centralized. As far as The unreferenced content cannot stand; it must be removed, that's not strictly true as this isn't a BLP. However if you're challenging the material's veracity, @Trovatore:, you're correct to require sourcing. As far as whether a dab can have explanatory content, that I'm not sure on and I apologize if my close means something that's out of another policy. I'm actually not sure how to proceed on this. With my editor had on, reading Cumulative density function, I feel the text is needed for a lay reader on why it results from the confusion between the two primary topics, even if it's not an error, strictly speaking. I think given the inability to proceed, DRV for more eyes makes sense. You're reviewing the close not because you disagree-for the reasons you noted here, my Talk and in your discussion above-but because we're stuck. I'd support this, unless you think there's an active project that might have some thoughts on what's next. Thoughts? (Note, I'm intermittently on line this evening so pardon any delay). Star Mississippi 21:42, 16 June 2023 (UTC)[reply]

Fair point about it not being a BLP. Am I challenging the veracity?
I suppose, in a very mild sense, I am. I understand Michael's objection to "cumulative density function" and for those reasons would not use the phrase myself.
But what do you get when you accumulate a density function? To me, "a cumulative density function" seems like a not-totally-stupid answer. It's true that that function isn't itself a density function, but English — even mathematical English — is not necessarily so tidy as for that to be a fatal objection.
Given that reasoning, I'm uncomfortable calling all the "otherwise respectable writers" wrong in Wikivoice. --Trovatore (talk) 22:51, 16 June 2023 (UTC)[reply]
Sorry, I haven't been around for a few days. So can I check, Trovatore, do you feel that our original concern (that this term is a mistake) is actually wrong, and the fact that it's been used by so many people means we have to regard it as a valid name for the function, unless we can find a couple of reliable people saying specifically that it's an error? If so, Michael Hardy, do you know of a source that says so? I appreciate this can be very difficult; it's often hard to find a source that says something that most people in a field believe is obvious. Elemimele (talk) 06:01, 21 June 2023 (UTC)[reply]
I'm not willing to say the term is a mistake or that it's not a mistake. I agree with Michael that it's an unfortunate term. But if I squint I can see a colorable case for it, and I think that anyone claiming actively that it's an error has the burden of proof. --Trovatore (talk) 06:04, 21 June 2023 (UTC)[reply]

@Trovatore: Two points:

  • You ask what you get when you accumulate a density function. There are such things as mass density, energy density, probability density, and some others. When you accumulate a mass density function you get a mass; when you accumulate an energy density function you get energy; when you accumulate a probability density function you get a probability. The values of cumulative distribution functions are probabilities.
  • You say clearing up confusions is not our job. I wonder why not. Wikipedia is supposed to provide information. That includes clearing up confusions.

Michael Hardy (talk) 02:43, 16 July 2023 (UTC) @Trovatore: In one way this remains akin to a disambiguation page: the authoritative sources for the information in it are in the pages to which it links. That is acceptable for disambiguation pages. A question is: For which sorts of pages is it acceptable, and why? Michael Hardy (talk) 02:48, 16 July 2023 (UTC)[reply]