Status: Unknown

my sandbox

location

This user comes from Taiwan.
本使用者來自臺灣。

Babel

zh

該用戶的母語是中文。
该用户的母语是中文。

en-2

This user can contribute with an intermediate level of English.

39Y

This Wikipedian was born on 3 July 1984 and is 39 years, 10 months, and 0 days old.

day

Today is Friday.

Today is 3 May 2024

Programming language

prog-2

This user is an intermediate programmer.

Java-2

This user is an intermediate Java programmer.

C++-2

This user is an intermediate C++ programmer.

PHP

This user can code PHP.

<html>

This user can write HTML.

mysql

This user writes programs that access MySQL.

SQL

This user uses SQL queries to locate their car keys.

States and ethnicities

這個用戶是炎黃子孫。
This user is a descendant of Shennong and Huang Di.

這個用戶是龍的傳人。
这个用户是龙的传人。
This user is a Descendant of the Dragon.

This user has ancestral roots in Taiwan.

Games

This user plays mahjong.

Introduction[edit]

Hello everyone, I am a postgraduate of institute of technology management in Taiwan Tsing-Hua University. My domain knowledge is about information management, knowledge management, information retrieval and data mining.

abhaac

My Thesis[edit]

My thesis topic is about the distributed knowledge management. A good data source of distributed knowledge is Wikipedia. That's why I am here and looking forward to some exciting discovery.

A more exact topic is Constructing a Knowledge Evolution Map System on Wikipedia. My thesis proposal was just on 1/14. An important reason to build such a system on Wikipedia is that the knowledge resource is rich and the quality of knowledge is good on Wiki.

User List[edit]

I need a list of users to keep track the history which can be a good data source cause the knowledge evolution map system will take individuals as subjects. A map for a user.

The criterion to select a user to be a subject are as below:

The user must have edited on wiki for a certain period of time.
The user must have rich knowledge resource.
The user must have edited in the past year.

user	link to contribution	Note
Ronz	Special:Contributions/Ronz
Fmccown	Special:Contributions/Fmccown	See also User:Fmccown, there is a list of topics the user have made main contributions.
JackyR	Special:Contributions/JackyR
Qwfp	Special:Contributions/Qwfp
Michael Hardy	Special:Contributions/Michael_Hardy
Angelo.romano	Special:Contributions/Angelo.romano
Warut	Special:Contributions/Warut
Mav	Special:Contributions/Mav
Acalamari	Special:Contributions/Acalamari
Hoary	Special:Contributions/Hoary
Greekboy	Special:Contributions/Greekboy
El_Greco	Special:Contributions/El_Greco
Grk1011	Special:Contributions/Grk1011

22)[edit]

It has come to an idea: using the concepts of n-gram and hierarchical clustering (HAC). N-gram Clustering by date can find out the periods when the user has edited the similar pages, while hierarchical clustering can find out the similar periods which may not be in the sequent time.

The experiment results look not bad. It works to identify the different knowledge periods in time line.

Updates (2008/4/22)[edit]

Now we've collected the data mentioned above and clustered them by date. There comes some problems:

Every cluster hasn't been clustered by knowledge domain. This would cause the ambiguous knowledge structure in a cluster. The idea which taking the categories of Wikipedia seems not good because Wiki's categories are also defined by users, and intermingle with some categories which are not well-defined or not related to domain knowledge;
We use bottom-up hierarchical clustering to classify the data. The threshold of merging two clusters in every hierarchy is totally the same, i.e. 0.8. I wounder why this would work in hierarchical clustering. It should be less similar when the hierarchy is getting higher;
With TFxIDF value implemented, the computing time increases exponentially when the hierarchy is higher.

JnW ^talk 12:41, 22 April 2008 (UTC)

Updates (2008/5/27)[edit]

Now it has been the end of May. The progress of thesis is still going. We have conducted the mechanism to find out the knowledge evolution map. We argue that a user may have accessed the similar topics on Wiki, so we firstly implement n-gram algorithm to identify the periods with similar knowledge structure. After that, we use HAC to cluster these periods. In order to decide a good clustering result, we use Minmax to determine the final clustering result in HAC.

Now the problem has come to visualize the clustering result. My classmate had suggested me a java-based visualization tooltip: JFreeChart, and I have produced some charts. But it looks like not so user-friendly...

JnW ^talk 08:40, 27 May 2008 (UTC)