1.Imagine that you are asked to build a search engine for finding relevant tweets. Please describe

what methods that you plan to use to (i) build indexes; (ii) rank the tweets; and (iii) evaluate the

results. Please provide justifications on your choice and explain potential limitations.

Hint: think about whether the methods you learned in the class can be applied to this problem.

If not, how to improve?

2.Outline a method to determine synonyms based on search engine logs. That is, you are given many queries, and for each query is a list of clicked (assumed relevant) documents.

3.Outline a method to disambiguate homographs (two words that are spelled the same) based on search engine logs. For example, how can we distinguish a financial institution bank and a river bank based on these logs?

4. Feature selection is the process of reducing the dimensionality of the feature space to increase performance and decrease running time (since there are fewer features). Outline a feature selection method for the unigram words feature representation using word relations.

