ZhiguangCao: March 2013

Saturday, March 9, 2013

clustering when user's data is nonnumeric

In data mining, we usually need to cluster using kmenas, hierarchical clustering, pearson correlation... if the user's data is numeric, it is very easy, e.g.

           height weight age salary
user a: 176      70       20    5110
user b:   172      65      23    5300

we just compute the distance or pearson correlation based on these numeric values.

but nowadays, we usually run into nonnumeric values on social networks. e.g

for facebook app-what do you want:

user a: girl friend, money, car, job
user b: car, house

one method we may employ is that:

           girl friend, money, car, job, house
user a        1             1         1     1     0
user b        0             0         0     1     1

however, if the number of items of user a is too large(it is very possible), then user b's data seems to be sparse, and we are likely to need more space to store these data.

another method is to use Tanimoto Coefficient:

the Tanimoto Coefficient uses the ratio of the intersecting set to the union set as the measure of similarity. Represented as a mathematical equation:

In this equation, N represents the number of attributes in each object (a,b). C in this case is the intersection set.

If we use python (In fact, we do usually use python in data mining) , we can perform this as following:

  
# Inputs: two lists
# Output: the Tanimoto Coefficient
def tanimoto (list1, list2):
  intersection = [common_item for common_item in list1 if common_item in list2]
  return float(len(c))/(len(a) + len(b) - len(c))

then the value can be used to cluster users.

Wednesday, March 6, 2013

develop web version facebook app using heroku

there is official document to do this, it is applicable both to linux and windows.

the link is :

https://devcenter.heroku.com/articles/facebook

It is not very diffcult althoug sometimes you do not know what dose it mean. Heroku helps us save the time to setup a server thus we could focus on the desining the app code.

But for most of the beginner, I think you would run into the following problems if it is your first time to use the above link.

For Heroku
1. we use heroku toolbelt, git version of control system to interact with the server with our local machine. So, pls remeber to add your %path%.

2. if you still run into:

fatal: Not a git repository (or any of the parent directories): .git

pls remember to do the following:

git init
git add . #the '.' after add should be typed
git commit -m 'Initial commit'

3.if you need to change the configure of git, you can find it in the .git folder, which is a hidden child folder of your app folder.

4. if your local os is linux, it will redirect you to: https://toolbelt.heroku.com/debian to install the heroku toolbelt.
you should paste this to your terminal and run:
wget -qO- https://toolbelt.heroku.com/install-ubuntu.sh | sh

in the mid of it, you will have to agree with one license term where it shows you <ok>. if you directly enter return, it will fail, then pls press "tab" first ,then press return.

5. when you run into:

The authenticity of host 'heroku.com (50.19.85.132)' can't be established.
RSA key fingerprint is *some.random.fingerprint.
some people suggest:
in .git/config, changed the project name in [remote "heroku"] to the name given by heroku

I do not know what dose it exactly mean, and I just jump it and without modifying anything, so far, the app still works fine.

6. if you have modified the codes, and you want to run the two commands:
$ git commit -am "changed greeting"
$ git push heroku
before you do that, make sure you have cd into your app folder. otherwise, it may not detect any modification of your app

For Localhost
If you want to develop the app locally, this post seems fine for me:
http://www.7tech.co.in/php/how-to-create-a-facebook-application-using-php/

you should build you local machine as a server first, the normal
WAMP is ok.

then when you choose to create new app, you should remeber the app id and app secret, and do not select using heroku as server.

while in the site url: you should input: http://localhost/myapp/
and for the canvas page url you could use the same addrss
for canvas secure url you could use: https://localhost/myapp/ instead.

Sunday, March 3, 2013

one 'interface' between C and matlab

One friend ask me how to realize the mixed coding using matlab and C, because, for one of his proj, he want to finish some task using C and after that he want to call matlab to do another task (related to matrix processing)as soon as possible, while the C code and matlab code should run all the time.

I do know there are many posts on how to implement the interface between C and matlab, it should configure something on matlab with many steps, sometimes it also depends on the version of matlab.
this way is feasible, but i do not like the configuration job, so i have anothe idea

Considering his requirement is not so demanding, general situation is that:
repeat: task 1 on C--->task 2 on matlab
task 1 on C-->task 2 on matlab...

so I come up with an solution: send a signal to matlab as long as task1 on C is finished.

For, C\C++, java, matlab, python, PHP ,C#,it is very easy to create one txt file no matter whether this txt file already exists or not.

So, we can create one txt file with the same filename as long as the task on C is done, and we can also choose the way of generating this file, we just ovewrite this file instead create a new one if txt file already exist.

then, we can judge this file is a new one or old one through its date attribute which would lead to the task 2, on matlab, we can write this:

   time_old='0';
   time_new='0';
   while(1)
      filename=dir(['D:\Matlab_Code\','*.txt']); % 
       num_bmp=length(filename);%     
       if num_bmp==1% we suppose there are at most one 1 txt file at one time     
         time_new=filename.date;
            if ~strcmp(time_new,time_old)
             % add task 2 code here
               time_old=time_new;% do not forget this command;
            end
      end

   end

For simple task, I think this way could work well. For evry demanding task, you 'd better try to use only C, in fact, there are already many libraries developed using C for matrix processing, you can just download, configure and call them.

communication between 3 PCs using UDP on Matlab

Years ago, I tried communication between PCs using VC++ based on widows socket. It was not very difficult. But I never thought Matlab could also implement this. Today, I tried, it works:

For Server:

clear;
clc;
close all;

u=udp('172.21.168.xxx', 4013,'LocalPort', 14012);
fopen(u)
u1=udp('172.21.170.yyy', 4013,'LocalPort', 14013);
fopen(u1)
fwrite(u1, 1);
fwrite(u, 1);
fclose(u)
fclose(u1)

Above, '172.21.168.xxx' is the client1 ip, and 4013 is the client1 port no, while 14012 is the server port number;
'172.21.170.yyy' is the client2 ip, and 4013(any one if it is open to user) is the client2 port no, while 14013 is the server port number;

For Client1:

u = udp('172.21.170.zzz',14012,'LocalPort',4012);
u.Timeout=1000;
fopen(u)
A = fread(u, 10);
fclose(u)

Above, '172.21.170.zzz' is the server ip, port number is corresponded to the server port. u.Timeout=1000 means it will wait at most 1000s if it dose not receive any thing. however, as long as it recieve thedata from the server, it will jump to the next command directly.
fclose(u) never forget to colse it when finished.

For Client2:

u = udp('172.21.170.zzz',14013,'LocalPort',4013);
u.Timeout=1000;
fopen(u)
A = fread(u, 10);
fclose(u)

Saturday, March 2, 2013

abnormal characters when using matlab to read txt files

when I read Chinese txt files using fopen function on matlab, many abnormal characters appear while there is no problem using the same code on the other PC. I know it is because of the default system encoding and decoding setting. It should be done if you configure it through control panel.

however, you can also solve the problem from the angle of Matlab, just still use fopen func, only difference is to add one or two parameters which we seldom pay attention to.

fileID = fopen(filename, permission, machineformat, encoding)

You can choose one of the following option for the 4th parameter encoding in above fopen function according to your situation.



'Big5'         'SO-8859-1'    'windows-932'
'EUC-JP'       'ISO-8859-2'   'windows-936'
'GBK'          'ISO-8859-3'   'windows-949'
'Macintosh'    'ISO-8859-4'   'windows-950'
'Shift_JIS'    'ISO-8859-9'   'windows-1250'
'US-ASCII'     'ISO-8859-13'  'windows-1251'
'UTF-8'        'ISO-8859-15'  'windows-1252'  
                              'windows-1253' 
                              'windows-1254' 
                              'windows-1257'

For my case, I choose the last one, 'UTF-8', and it works!

ZhiguangCao