Tuesday, October 30, 2007

Business applications of unstructured text

Interresting article in the ACM Communications.

A widely touted IT factoid states that
80% of the information produced by
and contained in most organizations
is stored in the form of unstructured
data. Most of it is text (such as memoranda,
internal documents, email,
organizational Web pages, and comments
from customers and from
internal service personnel), and most
of the applications that reflect the
value of unstructured data are able to
process it. Although unstructured
data takes other forms, including
images and audio, here I focus on the
applications, technologies, and architectures
for unstructured text acquisition
and analysis (UTAA).

1 comment:

Anonymous said...

There is no such thing as "unstructured data". If data is unstructured, it is usually called "chaos" or "random noise", not a "document" or an "image". An image file has a lot of structure in fact...