Saturday, February 06, 2010

Debug Log

- c string (char array) length not long enough to hold the sequence

Monday, November 06, 2006

Boundary Detection

I devised a new way to determine boundaries and tried out a simple version today. The idea is to assign a weight to possible boundaries by multiplying number of heads and number of tails at the point, after weighing with distance of these heads and tails from the point. Preliminary test seems satisfactory. Possible refinement can be done by tuning the window size and Poisson distribution parameter (lambda).

Fisher's Exact Test

Mulling over this statistical test for days, although I still haven't completely understood the details, I decided to do a preliminary test on Matlab.

Fortunately Octave can work on the Matlab file I downloaded. Now Pending for results...

Friday, October 27, 2006

Odyssey of a Research Student, Episode I

Long lost in the direction and topic of my research, now I gotta work on something concrete. At least in the following month, my direction will be focused on LC with HowNet dictionary (for my first paper, hopefully) and sentence segmentation (for the course project). Today I kicked off with some tests on measuring word association. Not until today I didn't discover the G-square test done by Nicola Stokes was based on Ted Peterson's "Fishing for Exactness". Another useful (and frequently quoted) paper is the one "Acquiring Collocations for Lexical Choice between Near-Synonyms" by Inkpen and Hirst. I'll work on these statiscal tests to verify which one is the best, accompanied by some preliminary evaluation. Good luck Kelvin!

Thursday, October 05, 2006

Keith van Rijsbergen

I should start telling about all this from my Wikipedia search for "SIGIR".
From the search result I discovered there is this "Salton Award" stuff in the SIGIR conference each year. The first thing that caught my eyes is nothing but the paper by the awardee of SIGIR '06. The title "Quantum Haystacks" is really attractive, in the midst of the quantum computing fervor. My glimpse on this paper, brought me the excitement, albeit I was very obscured, urged me to find more about this guy van Rijsbergen. Surprisingly his newest book "The Geometry of Information Retrieval" is within the CUHK library collection. Without a single second of hesitation I darted to the library to snatch the book. Well... flipping through the first few pages and the math about Hilbert Space didn't bring me new insights into my research, but I am pretty sure his theory is gonna bring impact to the world of IR. Here comes a chance for me to get to know this guy, if I am fortunate enough, the ESSIR 2007:
http://www.dcs.gla.ac.uk/essir2007/index.html

Wednesday, October 04, 2006

Finally back

I should have started to resume writing since I'm back from Beijing. Oblivious as I am, I neglected the existence of the blog completely. Now I'm back. I believe the I'm the sole reader of the blog. Anyway, just for the purpose of keeping track of my work. Also it is a good idea to keep track of the papers that I've read/ 've been reading /'ve finished.

Highlight of some current work:
1) Lexical Cohesion - Based on the thesis of Nicolas Stokes, Dr. Xie and I are working on a Mandarin Chinese Corpus of VOA. I finished a preliminary draft (v0.2) yesterday. Some interesting discussion between us yesterday, and the next step will be to play with these ideas. Details are coming.

2) Today's reading: Ma, B., Li, H., A Phonotactic-Semantic Paradigm for Automatic Spoken Document Classification, In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 369-376.
Wow, Sinkee dudes, cool!

Friday, July 14, 2006

July 14

It was the last day of my duty and I have so-called finalized my work. Honestly speaking not everything has been sorted out but at least I've completed a comprehensive manual that will enable anyone to follow up my job. Hope what I have done will give rise to something useful. Well...

Friday, June 30, 2006

June 30

This blog is deserted for a whole week. My progess is as sluggish as how frequently this blog gets updated.

The only contructive thing I did is to build an interface that depicts how a bunch of songs sound like a number of referencing songs. This idea was sparkled by Edward and I knocked-out it quite efficiently (this is in fact motivated by hearing the news of a friend getting into MS... and I did feel I should stop procrastinating and work like a cheetah and achieve something at least, I don't wanna screw up my life). It runs pretty smoothly. Now my mind has changed and decided to train only the referencing point with HMM, and the data set should be compared to the referencing points by log probability. This should be much more efficient if I got a large set of data.

For the MP3 player part, I don't know what is going on. I just keep failing to elicit the duration of song. Is it really that difficult to do so? Well...