Mining Unstructured Data

The information entered as “tweats” by a lot of people inside applications such as Twitter, Linkedin is unstructured. The “tweats” updated into such applications are similar to our own thought processes. Data Mining techniques involve mining on data that is precisely defined. For example a product survey contains questions such as Which color do you like the most? Which feature do you like the most, so on and so forth.

By writing some standard OLAP processing logic one would be able to derive reports required for providing critical business intelligence reports. In this case, there is also a considerable amount of effort spent on data definition, data entry and data analysis.

Tweats contain a lot of unstructured information. Rather than setting up a review committee to provide reviews on movies, products, packaged food,services so on and so forth, one can poke or construct a review system based on information updated into Twitter, Mouth Shut.com, Linkedin, facebook etc.. The challenges obviously would be to construct a mining system that would be based both on likelihood and statistics.

The user responses or tweats will be mapped with certain possible values. For example a tweat such as “Oh great I had a good time at the coffee shop” could indicate any value between 7 to 10 on a rating system. In the case of consolidating unstructured information, statistical inferences will be combined with likelihood. So the same tweat can be used to infer two similar situations or view points.

There are already some applications that provide Product reviews based on tweats inside Twitter. It is now for the developers to develop some more applications that can effectively consolidate responses in the form of tweats and also derive business intelligence.

Continue Reading

10 Very Interesting People (VIP) in Data Mining

During 2009 I was impressed and influenced by a lot of people in the data mining field. Among them, I have retained 10 data miners which I think made an impressive work in 2009. Their work may be raising interesting discussions, proposing solutions, promoting data mining, etc. I have also added, when known, their website/blog and Twitter account. Note that this list is completely subjective and has no specific order:

Gregory Piatetsky: Author of the most popular newsletter in the data mining community, he has recently updated his website with new content. You can now subscribe with RSS and you can find KDnuggets on Twitter. Gregory does an amazing job in collecting data mining related information, analyzing it and distributing it to data miners (website).
Bruce Ratner: He is author and his website contains several articles about data mining. He has recently been very active on social networks such as LinkedIn (website).
Ajay Ohri: I think this is the most active blogger in the data mining field. He is very active on many social networks and has an excellent collection of interviews with key people in data mining and related fields (blog).
Vincent Granville: As the creator of AnalyticBridge, Vincent has made a great job in building a community of people specialized in analytics fields. His network links more than 6600 members. So, it’s time to subscribe! (website).
Matthew Hurst: He is the author of the very famous blog “Data Mining: Text Mining, Visualization and Social Media”. He is very active on his blog on topics such as social media and data mining the blogosphere (blog, twitter).
Dean Abbott & Will Dwinnell: I put them together since they are co-bloggers. Abbott’s Analytics is an excellent blog (one of my favorite) related to data mining. When reading the posts, you can really feel the experience of the authors (blog).
Greg Linden: His famous blog – Geeking with Greg – is well known for a while now. He writes very informative posts about personalization related topics (blog).
Matt Cuts: He mainly writes about Google stuffs and SEO. However, he is also well know in the data mining world since several posts are directly or indirectly related to this field (blog).
Themos Kalafatis: He writes a lot about text mining (social network mining, etc.) and his posts are very practical. It is always a pleasure to read his blog (blog, twitter).
Randall Matignon: He is the author of very comprehensive books on SAS Enterprise Miner. You can find all information about his books on his webpage (website).

Continue Reading

10 Very Interesting People (VIP) in Data Mining

Gregory Piatetsky: Author of the most popular newsletter in the data mining community, he has recently updated his website with new content. You can now subscribe with RSS and you can find KDnuggets on Twitter. Gregory does an amazing job in collecting data mining related information, analyzing it and distributing it to data miners (website).

Bruce Ratner: He is author and his website contains several articles about data mining. He has recently been very active on social networks such as LinkedIn (website).

Ajay Ohri: I think this is the most active blogger in the data mining field. He is very active on many social networks and has an excellent collection of interviews with key people in data mining and related fields (blog).

Vincent Granville: As the creator of AnalyticBridge, Vincent has made a great job in building a community of people specialized in analytics fields. His network links more than 6600 members. So, it’s time to subscribe! (website).

Matthew Hurst: He is the author of the very famous blog “Data Mining: Text Mining, Visualization and Social Media”. He is very active on his blog on topics such as social media and data mining the blogosphere (blog, twitter).

Dean Abbott & Will Dwinnell: I put them together since they are co-bloggers. Abbott’s Analytics is an excellent blog (one of my favorite) related to data mining. When reading the posts, you can really feel the experience of the authors (blog).

Greg Linden: His famous blog – Geeking with Greg – is well known for a while now. He writes very informative posts about personalization related topics (blog).

Matt Cuts: He mainly writes about Google stuffs and SEO. However, he is also well know in the data mining world since several posts are directly or indirectly related to this field (blog).

Themos Kalafatis: He writes a lot about text mining (social network mining, etc.) and his posts are very practical. It is always a pleasure to read his blog (blog, twitter).

Randall Matignon: He is the author of very comprehensive books on SAS Enterprise Miner. You can find all information about his books on his webpage (website).

For More Information about Data Minining click here

Continue Reading