Talk:Web mining

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

External links and conferences?[edit]

How can we clean up this page so there is not a constant argument about links going on, the edit history looks very bad.

Can we de-commercialize this page, or at least allow everyone to list their conferences there? (please note I am someone who works at Wessex Inst.) --Curuxz (talk) 09:42, 2 April 2008 (UTC)[reply]

What is web mining?[edit]

To me "web mining" is the action of extracting (machine-readable) info out of the Web (or any web for that matter) like web search engines (like Google) do. Both extracting info out of content (general info web pages provide) and structure (relations between web pages) which hyperlinks provide.

This page also includes eg SGML document structure, which should be at other pages like DOM, SGML or Markup. It also includes things like Web usage tracking which should either be moved to Internet tracking or given it's own page. IMHO these subjects don't fall under "Web mining". —Preceding unsigned comment added by SvartMan (talkcontribs) 18:20, 4 June 2009 (UTC)[reply]

How to do?[edit]

I've read a few things on vertical search engines, specialised search engines based on API's of general purpose search engines as Google. Could anyone add some inforamtion on this? many thanks in advance. —Preceding unsigned comment added by 81.240.53.252 (talk) 08:09, 6 May 2010 (UTC)[reply]

Web content mining section looks wrong[edit]

The sentences are incomplete and incoherent. It is likely that this is due to erroneous cut and paste operation. Some one should fix it.Wictator (talk) 03:50, 17 April 2012 (UTC)[reply]

I tried to fix it up a bit (add citations, linked to TF-IDF wiki page), but still it says "when the length of the words in a document goes to" and there is some mathematical symbols that do not render right. Does anybody know what the author was trying to say? Ctorchia87 (talk) 03:14, 10 October 2012 (UTC)[reply]

Data mining and Data collection are two different things[edit]

Data Mining is the trend analysis stuff, and data collection is the gathering/harvesting/structuring of data into useful data. This isn't reflected in the information at all so there is a lack of clarity.

If it is correct that data collection is a subset of data mining, rather than a separate entity, then this should also be mentioned. See here http://blog.import.io/post/data-mining-vs-data-collection

— Preceding unsigned comment added by 213.104.36.112 (talk) 13:19, 22 April 2014 (UTC)[reply]

Web mining vs. Web analytics[edit]

I think the article should explain the difference between the two. The only thing I could find on it was: http://www.analyticbridge.com/forum/topics/2004291:Topic:20302 --Fixuture (talk) 00:25, 13 June 2015 (UTC)[reply]

Cleaning up this article (should it even exist?)[edit]

I monitor a bunch of computer-related topics, and noticed an edit war going on here. When I looked, I immediately noticed a poorly constructed article with at least one spammy external link. That link was removed by another editor before I could do it, so it's gone (and should stay gone). I'm going to look for more spammy/promo links and remove them, too (see WP:ELNO).

If an editor believes this article should be deleted (and redirected to Data mining), they should propose that through the well-documented AfD process. Regardless, the article right now consists of nearly 100% unsourced statements. I believe these—especially in a technical article—are damaging to the encyclopedia, so I'm going to start either removing the ones that are not encyclopedic, or citing the ones that I can match to a reference. It will be great if the two warring editors assist with this effort. Yeah, it's a lot more work than just hitting Delete or Revert, but it makes for a lasting improvement.

And, please, use this page to discuss what you want to do before taking drastic action. It's the way of things here. — UncleBubba T @ C ) 19:58, 20 October 2020 (UTC)[reply]

@UncleBubba, do you (and other editors) think there is a benefit for this article to exist at all?
The term is fairly rare, and while Google Scholar finds some uses, especially in 1998-2005, they seem to basically talk about combining a Web crawler with Pattern recognition and Data mining. There isn't much that doesn't belong to any of these articles - which is demonstrated by the history of this page, which only appears to have one source that mentions "web mining" and is not self-published. My impression is, as web matured, the subject split to a few specialised areas and isn't discussed as a whole in serious publications anymore. I don't see how this page can be improved and what would be the point of it.
Google Trends shows that use of the term plummeted in English-speaking world. It seems to mostly refer to mining cryptocurrency in browser using JavaScript in the rest of the world (note spikes in 2017 and 2021, as well as related queries and related topics).
I'd support converting this article to a disambiguation page and link higher-quality articles on the concepts that comprise "web mining" instead of improving or deleting. PaulT2022 (talk) 07:37, 31 July 2022 (UTC)[reply]
Pinging @TenPoundHammer as well. PaulT2022 (talk) 07:45, 31 July 2022 (UTC)[reply]
@PaulT2022, I agree with your assertions. (Nice research, BTW.) I have worked in IT for a few decades (and still do) and cannot recall ever running across the term web mining. "Data mining" is another story, and is frequently used. I can't (and don't have time to) prove it, but I suspect a few people were using this article to get citations indexed by certain search engines, perhaps to give some scholarly work a patina of respectability (which is a misuse of Wikipedia). Regardless, I would support just pointing this article to Web mining. If you have the time to do the disambig page, that would work, too, but it kinda sounds to me like overkill (but I certainly could be wrong about that!)
Cheers! — UncleBubba T @ C ) 02:29, 4 August 2022 (UTC)[reply]
@UncleBubba thanks, made a disambiguation page. I think its more useful than a redirect. PaulT2022 (talk) 03:42, 4 August 2022 (UTC)[reply]