Multimedia Information Systems Lecture 1, 2

About CSE 408

Description: the course deals with the design, use, and applications of multimedia information systems. An introduction to acquisition, processing, compression, storage, retrieval, and presentation of data from different media types such as images, text, voice, graphics, and alphanumeric.
• Learning objectives: master fundamental concepts, specific domain knowledge, applications, and current research topics.

Topics To Be Covered

• Introduction to multimedia systems
• Web search
• Natural language processing
• Sound informatics
• Image & video
• Compression algorithms and standards (JPEG, MPEG)
• Multimedia Information System design
• Data mining and artificial intelligence
• Data visualization
• Computer vision algorithms and applications
• Multimedia Information System performance evaluation

Multimedia elements

Audio -> Video -> Animation -> Interactive -> graphic -> text

What is Multimedia?

Multimedia is content that uses a combination of different content forms such as text, audio, images. amiation, video and inteactive content.

Ex: a web-based video editor that lets anyone create a new video by editing, annotating and reminxing videos on the cloud

  • digital media- usually recorded, displayed, or accessed using electronic devices

Multimedia research Topics and projects

  1. Multimedia processign and coding
  2. multimedia system suppory and networking
  3. Multimedia tools, end-systems and applications
  4. Multi-model interaction and integration

Lecture 2

Assumption 1: A hyperlink between pages denotes author perceived relevance (quality signal)

Assumption 2: The author of the hyperlink describes the target page (texual context)

Q: how hard is it go from one page to another?
Over 75% of the time there is no directed path from one ranfom web page to another

When a directed path exitsts, 16 clicks
When an undirected path exists its average 7 clicks

six degree of separation 你和任何一个陌生人之间所间隔的人不会超过六个

Information retrieval

  • IR is the science of searching for documents, info within documents…

Web search vs. Info retrieval

  • The web contains a lot of duplication

• The scale of web search is way beyond traditional information retrieval
• The web is very dynamic
• The quality of web page is not uniform
• Have to figure out which belong to the topic, range is wide

Three types of queires
Navigational: Ex: Facebook. Simply look the place you want to reach
Informational: what is age of CLINTON? very specific
Transacational: search for commercial nature, Eg: gut insurance quote

There are more than 3.5B google searches every day
76% of global searches take place on google
60% are from mobile devices
80% of sarches were informational

The anatomy of a large-scale hypertexual web search engine

Web searching: architecture
Web crawler: first algorithm
a way to collect large data

retireve the page, extract URL, update and repeat

Web searching: Architecture

  • Documents stored on many Web servers are indexed in a single central index
  • The central index is implemented as a single system on very large number of computers

Concept of relevance and importance
Relevance

  1. Explain three state of the art MM research topicsinvolved documents similarity between terms in the query and each document
  2. location info, for use of proximity in multi-world search
  3. in page title, page url important
  4. font, color also considerd

Importance: measures documents by their likelihood of being useful to a variety of users. popularity.

Inverted index
• For each word: set of documents where it occurs
Take popular word in documents as a term

page rank algorithm

  • Used to estimate popularity of documents
  • if doc has more links to it, then more findable it is.

Intuitive model
A user:

  1. starts at a random page on the web
  2. select random hyperlink from current page and jumps to the corresponding page
  3. repeat step 2 at a very large number of times.

page are ranked according to the relative frequency with which they are visted

Basic algorithm
• Matrix Representation
• Normalize by number of links from page
• weighting pages- initally all pages 1/n, recalculate weights
• iterate

Random teleports
The google solution for spider traps

SIngle word query ranking
• Assign weights

Multi-word query ranking
Similar to single word but have to use proximity on the document

Indexing the web goals: Precision

short queries applied to very large numbers of items leads to large number of hits

  • Goal is that the first 10-100 hits should satisify users
  • Recall is not an important criterion

completeness of index is not an important factot, comprehensive crawling is unnecessary

Precision and Recall

  • Precision is also called ( positive predictive value ) is the fraction of retrieved instanced that are relevant
  • Recall (AKA sensitivity), is the fraction of relevent instances that are retrieved

Relevance and Importance

  • Relevence is considerd as ( Relevant or Not Relevant ), it is usually estimated by the similarity between the terms in the query and each document
  • Importance measure