Semantic Web - Sharif

Semantic Search Spring 2007 Computer Engineering Department Sharif University of Technology Outline Traditional search concepts Semantic Search Semantic web - Computer Engineering Dept. - Spring 2007 2 Traditional search Originated from Information Retrieval research Enhanced for the Web Crawling and indexing Web specific ranking

An information need is represented by a set of keywords Very simple interface Users does not have to be experts Similarity of each document in the collection with the query is estimated A ranking is applied on the results to sort out the results and show them to the users Semantic web - Computer Engineering Dept. - Spring 2007 3 Representation of documents Accents spacing Docs Noun

stopwords groups stemming Manual indexing structure structure Full text Semantic web - Computer Engineering Dept. - Spring 2007 Index terms 4 Retrieval process

Text User Interface user need Text Text Operations logical view user feedback Query Operations query Searching logical view Indexing

inverted file Index retrieved docs ranked docs DB Manager Module Ranking Semantic web - Computer Engineering Dept. - Spring 2007 Text Database 5 Indexing Documents to

be indexed. Friends, Romans, countrymen. Tokenizer Friends Romans Token stream. Countrymen Linguistic modules Modified tokens. Inverted index. friend roman

countryman Indexer friend 2 4 roman 1 2 countryman 13 Semantic web - Computer Engineering Dept. - Spring 2007

16 6 Retrieval models A retrieval model specifies how the similarity of a document to a query is estimated. Three basic retrieval models: Boolean model Vector model Probabilistic model Semantic web - Computer Engineering Dept. - Spring 2007 7 Boolean model Query is specified using logical operators: AND, OR and NOT Merge of the posting lists is the basic operation Consider processing the query:

Brutus AND Caesar Locate Brutus in the Dictionary; Retrieve its postings. Locate Caesar in the Dictionary; Retrieve its postings. Merge the two postings: 2 4 8 16 1 2 3

5 32 8 64 1 3 128 Brutus 21 34 Caesar Semantic web - Computer Engineering Dept. - Spring 2007 8 Boolean queries: Exact match The Boolean Retrieval model is being able to ask a query that is a Boolean expression:

Boolean Queries are queries using AND, OR and NOT to join query terms Views each document as a set of words Is precise: document matches condition or not. Primary commercial retrieval tool for 3 decades. Professional searchers (e.g., lawyers) still like Boolean queries: You know exactly what youre getting. Semantic web - Computer Engineering Dept. - Spring 2007 9 Example: WestLaw http://www.westlaw.com/ Largest commercial (paying subscribers) legal search service (started 1975; ranking added 1992) Tens of terabytes of data; 700,000 users

Majority of users still use boolean queries Example query: What is the statute of limitations in cases involving the federal tort claims act? LIMIT! /3 STATUTE ACTION /S FEDERAL /2 TORT /3 CLAIM /3 = within 3 words, /S = in same sentence Semantic web - Computer Engineering Dept. - Spring 2007 10 Ranking search results Boolean queries give inclusion or exclusion of docs. Often we want to rank/group results Need to measure proximity from query to each doc. Need to decide whether docs presented to user are singletons, or a group of docs covering various aspects of the query.

Semantic web - Computer Engineering Dept. - Spring 2007 11 Spell correction Two principal uses Correcting document(s) being indexed Retrieve matching documents when query contains a spelling error Two main flavors: Isolated word Check each word on its own for misspelling Will not catch typos resulting in correctly spelled words e.g., from form Context-sensitive Look at surrounding words, e.g., I flew form Heathrow to Narita.

Semantic web - Computer Engineering Dept. - Spring 2007 12 Isolated word correction Fundamental premise there is a lexicon from which the correct spellings come Two basic choices for this A standard lexicon such as Websters English Dictionary An industry-specific lexicon hand-maintained The lexicon of the indexed corpus E.g., all words on the web All names, acronyms etc. (Including the mis-spellings) Semantic web - Computer Engineering Dept. - Spring 2007 13

Isolated word correction Given a lexicon and a character sequence Q, return the words in the lexicon closest to Q Whats closest? We have several alternatives Edit distance Weighted edit distance n-gram overlap Semantic web - Computer Engineering Dept. - Spring 2007 14 Edit distance Given two strings S1 and S2, the minimum number of basic operations to covert one to the other Basic operations are typically character-level Insert Delete Replace

E.g., the edit distance from cat to dog is 3. Generally found by dynamic programming. Semantic web - Computer Engineering Dept. - Spring 2007 15 n-gram overlap Enumerate all the n-grams in the query string as well as in the lexicon Use the n-gram index (recall wild-card search) to retrieve all lexicon terms matching any of the query n-grams Threshold by number of matching n-grams Semantic web - Computer Engineering Dept. - Spring 2007 16 Example with trigrams

Suppose the text is november Trigrams are nov, ove, vem, emb, mbe, ber. The query is december Trigrams are dec, ece, cem, emb, mbe, ber. So 3 trigrams overlap (of 6 in each term) How can we turn this into a normalized measure of overlap? Semantic web - Computer Engineering Dept. - Spring 2007 17 One option Jaccard coefficient A commonly-used measure of overlap

Let X and Y be two sets; then the J.C. is X Y / X Y Equals 1 when X and Y have the same elements and zero when they are disjoint X and Y dont have to be of the same size Always assigns a number between 0 and 1 Now threshold to decide if you have a match E.g., if J.C. > 0.8, declare a match Semantic web - Computer Engineering Dept. - Spring 2007 18 Phrase queries Want to answer queries such as stanford university as a phrase Thus the sentence I went to university at Stanford is not a match. The concept of phrase queries has proven easily

understood by users; about 10% of web queries are phrase queries No longer suffices to store only entries Semantic web - Computer Engineering Dept. - Spring 2007 19 Biword indexes Index every consecutive pair of terms in the text as a phrase For example the text Friends, Romans, Countrymen would generate the biwords friends romans romans countrymen Each of these biwords is now a dictionary term Two-word phrase query-processing is now immediate.

Semantic web - Computer Engineering Dept. - Spring 2007 20 Longer phrase queries stanford university palo alto can be broken into the Boolean query on biwords: stanford university AND university palo AND palo alto Without the docs, we cannot verify that the docs matching the above Boolean query do contain the phrase. Can have false positives! Semantic web - Computer Engineering Dept. - Spring 2007 21 Solution 2: Positional indexes Store, for each term, entries of the form:

doc1: position1, position2 ; doc2: position1, position2 ; etc.> Semantic web - Computer Engineering Dept. - Spring 2007 22 Positional index example Can compress position values/offsets Nevertheless, this expands postings storage

substantially Semantic web - Computer Engineering Dept. - Spring 2007 23 Processing a phrase query Extract inverted index entries for each distinct term: to, be, or, not. Merge their doc:position lists to enumerate all positions with to be or not to be. to: 2:1,17,74,222,551; 4:8,16,190,429,433; 7:13,23,191; ... be: 1:17,19; 4:17,191,291,430,434; 5:14,19,101; ... Same general method for proximity searches Semantic web - Computer Engineering Dept. - Spring 2007

24 Vector model of retrieval Documents are represented as vectors of terms In each entry a weight is considered. The weight is tfxidf: term frequency (tf ) or wf, some measure of term density in a doc inverse document frequency (idf ) measure of informativeness of a term: its rarity across the whole corpus could just be raw count of number of documents the term occurs in (idfi = 1/dfi) but by far the most commonly used version is: n idf i log

df i Semantic web - Computer Engineering Dept. - Spring 2007 25 Why turn docs into vectors? First application: Query-by-example Given a doc d, find others like it. Now that d is a vector, find vectors (docs) near it. Semantic web - Computer Engineering Dept. - Spring 2007 26 Intuition t3 d2

d3 d1 t1 d5 t2 d4 Postulate: Documents that are close together in the vector space talk about the same things. Semantic web - Computer Engineering Dept. - Spring 2007 27 Cosine similarity Distance between vectors d1 and d2 captured by

the cosine of the angle x between them. Note this is similarity, not distance No triangle inequality for similarity. t3 d2 d1 t1 t2 Semantic web - Computer Engineering Dept. - Spring 2007 28 Cosine similarity d j d k

sim(d j , d k ) d j dk n n i 1 wi , j wi ,k 2 i, j i 1 w n 2 w

i 1 i,k Cosine of angle between two vectors The denominator involves the lengths of the vectors. Normalization Semantic web - Computer Engineering Dept. - Spring 2007 29 Measures for a search engine How fast does it index Number of documents/hour (Average document size) How fast does it search Latency as a function of index size Expressiveness of query language

Ability to express complex information needs Speed on complex queries Semantic web - Computer Engineering Dept. - Spring 2007 30 Measures for a search engine All of the preceding criteria are measurable: we can quantify speed/size; we can make expressiveness precise The key measure: user happiness What is this? Speed of response/size of index are factors But blindingly fast, useless answers wont make a user happy Need a way of quantifying user happiness Semantic web - Computer Engineering Dept. - Spring 2007

31 Unranked retrieval evaluation: Precision and Recall Precision: fraction of retrieved docs that are relevant = P(relevant|retrieved) Recall: fraction of relevant docs that are retrieved = P(retrieved|relevant) Retrieved Relevant Not Relevant tp fp Not retrieved

fn tn Precision P = tp/(tp + fp) Recall R = tp/(tp + fn) Semantic web - Computer Engineering Dept. - Spring 2007 32 Precision/Recall You can get high recall (but low precision) by retrieving all docs for all queries! Recall is a non-decreasing function of the number of docs retrieved In a good system, precision decreases as either number of docs retrieved or recall increases A fact with strong empirical confirmation

Semantic web - Computer Engineering Dept. - Spring 2007 33 Typical (good) 11 point precisions 1 P r e c is ion 0.8 0.6 0.4 0.2 0

0 0.2 0.4 0.6 0.8 1 Recall Semantic web - Computer Engineering Dept. - Spring 2007 34 Query expansion

Semantic web - Computer Engineering Dept. - Spring 2007 35 Relevance Feedback Relevance feedback: user feedback on relevance of docs in initial set of results User issues a (short, simple) query The user marks returned documents as relevant or nonrelevant. The system computes a better representation of the information need based on feedback. Relevance feedback can go through one or more iterations. Idea: it may be difficult to formulate a good query when you dont know the collection well, so iterate Semantic web - Computer Engineering Dept. - Spring 2007 36

Relevance Feedback: Example Image search engine http://nayana.ece.ucsb.edu/imsearch/imsearch.ht ml Semantic web - Computer Engineering Dept. - Spring 2007 37 Results for Initial Query Semantic web - Computer Engineering Dept. - Spring 2007 38 Relevance Feedback Semantic web - Computer Engineering Dept. - Spring 2007

39 Results after Relevance Feedback Semantic web - Computer Engineering Dept. - Spring 2007 40 Rocchio Algorithm The Rocchio algorithm incorporates relevance feedback information into the vector space model. Want to maximize sim (Q, Cr) - sim (Q, Cnr) The optimal query vector for separating relevant and non-relevant documents (with cosine sim.): 1 Qopt Cr

d j d j C r 1 N Cr d j

d j C r Qopt = optimal query; Cr = set of rel. doc vectors; N = collection size Unrealistic: we dont know relevant documents. Semantic web - Computer Engineering Dept. - Spring 2007 41 Rocchio 1971 Algorithm (SMART) Used in practice: 1 qm q0 Dr

1 d j Dnr d j Dr d j d j Dnr

qm = modified query vector; q0 = original query vector; ,,: weights (hand-chosen or set empirically); Dr = set of known relevant doc vectors; Dnr = set of known irrelevant doc vectors New query moves toward relevant documents and away from irrelevant documents Tradeoff vs. / : If we have a lot of judged documents, we want a higher /. Term weight can go negative Negative term weights are ignored (set to 0) Semantic web - Computer Engineering Dept. - Spring 2007 42 Types of Query Expansion Global Analysis: (static; of all documents in collection) Controlled vocabulary Maintained by editors (e.g., medline) Manual thesaurus

E.g. MedLine: physician, syn: doc, doctor, MD, medico Automatically derived thesaurus (co-occurrence statistics) Refinements based on query log mining Common on the web Local Analysis: (dynamic) Analysis of documents in result set Semantic web - Computer Engineering Dept. - Spring 2007 43 Probabilistic relevance feedback Rather than reweighting in a vector space If user has told us some relevant and some irrelevant documents, then we can proceed to build a probabilistic classifier, such as a Naive Bayes

model: P(tk|R) = |Drk| / |Dr| P(tk|NR) = |Dnrk| / |Dnr| tk is a term; Dr is the set of known relevant documents; Drk is the subset that contain tk; Dnr is the set of known irrelevant documents; Dnrk is the subset that contain tk. Semantic web - Computer Engineering Dept. - Spring 2007 44 Binary Independence Model n p ( xi | R, q ) O ( R | q, d ) O ( R | q ) i 1 p ( xi | NR , q ) Since xi is either 0 or 1: p( xi 1 | R, q) p( xi 0 | R, q)

O( R | q, d ) O( R | q) xi 1 p ( xi 1 | NR , q ) xi 0 p ( xi 0 | NR , q ) Semantic web - Computer Engineering Dept. - Spring 2007 45 Iteratively estimating pi 1. Assume that pi constant over all xi in query pi = 0.5 (even odds) for any given doc 2. Determine guess of relevant document set: V is fixed size set of highest ranked documents on this model (note: now a bit like tf.idf!)

3. We need to improve our guesses for pi and ri, so Use distribution of xi in docs in V. Let Vi be set of documents containing xi pi = |Vi| / |V| Assume if not retrieved then not relevant ri = (ni |Vi|) / (N |V|) 4. Go to 2. until converges then return ranking 46 Bayesian Networks for Text Retrieval (Turtle and Croft 1990) Standard probabilistic model assumes you cant estimate P(R| D,Q) Instead assume independence and use P(D|R)

But maybe you can with a Bayesian network* What is a Bayesian network? A directed acyclic graph Nodes Events or Variables Assume values. For our purposes, all Boolean Links model direct dependencies between nodes Semantic web - Computer Engineering Dept. - Spring 2007 47 Bayesian Networks a,b,c - propositions (events). Bayesian networks model causal relations between events a

b p(a) p(b) Conditional dependence c p(c|ab) for all values for a,b,c Inference in Bayesian Nets: Given probability distributions for roots and conditional probabilities can compute apriori probability of any instance Fixing assumptions (e.g., b was observed) will cause

recomputation of probabilities Semantic web - Computer Engineering Dept. - Spring 2007 48 Bayesian Nets for IR: Idea Document Network di -documents d1 d2 tiLarge, - document representations but t1 t2 riCompute - concepts

once for each document collection r1 r2 r3 c1 c2 q1 dn tn rk ci - query concepts cm Small, compute once for every query qi - high-level

concepts q2 Query Network I I - goal node Semantic web - Computer Engineering Dept. - Spring 2007 49 Web search basics Sponsored Links CG Appliance Express Discount Appliances (650) 756-3931 Same Day Certified Installation www.cgappliance.com San Francisco-Oakland-San Jose, CA

User Miele Vacuum Cleaners Miele Vacuums- Complete Selection Free Shipping! www.vacuums.com Miele Vacuum Cleaners Miele-Free Air shipping! All models. Helpful advice. www.best-vacuum.com Web Results 1 - 10 of about 7,310,000 for miele. (0.12 seconds) Miele, Inc -- Anything else is a compromise Web spider

At the heart of your home, Appliances by Miele. ... USA. to miele.com. Residential Appliances. Vacuum Cleaners. Dishwashers. Cooking Appliances. Steam Oven. Coffee System ... www.miele.com/ - 20k - Cached - Similar pages Miele Welcome to Miele, the home of the very best appliances and kitchens in the world. www.miele.co.uk/ - 3k - Cached - Similar pages Miele - Deutscher Hersteller von Einbaugerten, Hausgerten ... - [ Translate this page ] Das Portal zum Thema Essen & Geniessen online unter www.zu-tisch.de. Miele weltweit ...ein Leben lang. ... Whlen Sie die Miele Vertretung Ihres Landes. www.miele.de/ - 10k - Cached - Similar pages Herzlich willkommen bei Miele sterreich - [ Translate this page ] Herzlich willkommen bei Miele sterreich Wenn Sie nicht automatisch weitergeleitet werden, klicken Sie bitte hier! HAUSHALTSGERTE ... www.miele.at/ - 3k - Cached - Similar pages Search

Indexer The Web Indexes Semantic web - Computer Engineering Dept. - Spring 2007 Ad indexes 50 Semantic Search Ontology Meta Search Engines This group do retrieval by putting a system on top of a current search engine There are two types of this systems Using Filetype feature of search engines Swangling Semantic web - Computer Engineering Dept. - Spring 2007

52 Filetype Feature Google started indexing RDF documents some time in late 2003 In the first type, there is a search engine that only searches specific file types (e.g. RSS, RDF, OWL) In fact we just forward the keywords of the queries with filetype feature to Google The main concern of such systems is on the visualization and browsing of results Semantic web - Computer Engineering Dept. - Spring 2007 53 OntoSearch A basis system with Google as its heart Abilities:

The ability to specify the types of file(s) to be returned (OWL, RDFS, all) The ability to specify the types of entities to be matched by each keyword (concept, attribute, values, comments, all) The ability to specify partial or exact matches on entities. Sub-graph matching eg concept animal with concept pig within 3 links; concepts with particular attributes Semantic web - Computer Engineering Dept. - Spring 2007 54 Ontology Meta Search Engines In the second type we use traditional search engines again But since semantic tags are ignored by the underlying search engine, an intermediate format for documents and user queries are used A technique named Swangle is used for this purpose

With this technique RDF triples are translated into strings suitable for underlying search engine Semantic web - Computer Engineering Dept. - Spring 2007 55 Swangling Swangling turns a SW triple into 7 word like terms One for each non-empty subset of the three components with the missing elements replaced by the special dont care URI Terms generated by a hashing function (e.g., SHA1) Swangling an RDF document means adding in triples with swangle terms. This can be indexed and retrieved via conventional search engines like Google Allows one to search for a SWD with a triple that claims Ossama bin Laden is located at X

Semantic web - Computer Engineering Dept. - Spring 2007 56 A Swangled Triple Swangled text for [http://www.xfront.com/owl/ontologies/camera/#Camera, http://www.w3.org/2000/01/rdf-schema#subClassOf, http://www.xfront.com/owl/ontologies/camera/#PurchaseableItem] N656WNTZ36KQ5PX6RFUGVKQ63A M6IMWPWIH4YQI4IMGZYBGPYKEI HO2H3FOPAEM53AQIZ6YVPFQ2XI 2AQEUJOYPMXWKHZTENIJS6PQ6M IIVQRXOAYRH6GGRZDFXKEEB4PY 75Q5Z3BYAKRPLZDLFNS5KKMTOY

2FQ2YI7SNJ7OMXOXIDEEE2WOZU Semantic web - Computer Engineering Dept. - Spring 2007 57 Swangler Architecture Local KB Semantic Web Query Inference Engine Encoder Semantic

Markup (swangler) Encoded Markup Web Search Engine Semantic Markup Filters Semantic Markup Extractor

Semantic web - Computer Engineering Dept. - Spring 2007 Ranked Pages 58 Whats the point? Wed like to get our documents into Google Swangle terms look like words to Google and other search engines. On the other side, this translation is done for user queries too. Add rules to the web server so that, when a search spider asks for document X the document swangled(X) is returned We could also use Swanglish hashing each triple into N of the 50K most common English words

Semantic web - Computer Engineering Dept. - Spring 2007 59 Crawler Based Search Engines They have a crawler and ranking of their own Semantic web - Computer Engineering Dept. - Spring 2007 60 Semantic web - Computer Engineering Dept. - Spring 2007 61 Swoogle Architecture data analysis

IR analyzer SWD analyzer interface Web Server SWD Cache SWD Metadata metadata creation SWD discovery Candidate URLs Web Service Agent Service SWD Reader

Web Crawler The The Web Web Swoogle 2: 340K SWDs, 48M triples, 5K SWOs, 97K classes, 55K properties, 7M individuals (4/05) Swoogle 3: 700K SWDs, 135M triples, 7.7K SWOs, Semantic web - Computer Engineering Dept. - Spring 2007 (11/05) 62 Crawler Based Ontology Search Engines Discovery

Crawling of SW documents is different from html documents In SW we express knowledge using URI in RDF triples. Unlike html hyperlinks, URIs in RDF may point to a non existing entity Also RDF may be embedded in html documents or be stored in a separate file. Semantic web - Computer Engineering Dept. - Spring 2007 63 Semantic Web Crawler Such crawlers should have the following properties Should crawl on heterogeneous web resources (owl, oil, daml, rdf, xml, html) Avoid circular links Completing RDF holes Aggregating RDF chunks

Semantic web - Computer Engineering Dept. - Spring 2007 64 Metadata Creation Web document metadata When/how discovered/fetched Suffix of URL Last modified time Document size SSWD metadata Language features OWL species RDF encoding

Statistical features Defined/used terms Declared/used namespaces Ontology Ratio Ontology Rank Ontology annotation Label Version Comment Related Relational Metadata Links to other SWDs

Imported SWDs Referenced SWDs Extended SWDs Prior version Links to terms Classes/Properties defined/used Semantic web - Computer Engineering Dept. - Spring 2007 65 Digesting

Digest But the main point is that count, type and meaning of relations in SW is more complete than the current web Semantic web - Computer Engineering Dept. - Spring 2007 66 Semantic Web Navigation Model sameNamespace, sameLocalname Extends class-property bond Term Search 1 RDF graph Resource

literal uses populates 2 SWT 3 isUsedBy isPopulatedBy Web SWD defines

officialOnto isDefinedBy rdfs:subClassOf 6 rdfs:seeAlso rdfs:isDefinedBy 5 4 SWO 7 Document Search owl:imports

Navigating the HTML web is simple; theres just one kind of link. 67 The SW has more kinds of links and hence more navigation Semantic web - Computer Engineering Dept. - Spring 2007 An Example http://xmlns.com/foaf/0.1/index.rdf http://xmlns.com/foaf/0.1/index.rdf owl:Class rdf:type

foaf:Person http://www.w3.org/2002/07/owl owl:InverseFunctionalProperty rdf:type rdfs:subClassOf http://www.cs.umbc.edu/~finin/foaf.rdf foaf:Person owl:imports rdf:type foaf:mbox mailto:[email protected]

foaf:Agent rdfs:domain rdf:type owl:Thing foaf:mbox rdfs:range http://www.cs.umbc.edu/~dingli1/foaf.rdf foaf:Person rdf:type rdfs:seeAlso http://www.cs.umbc.edu/~finin/foaf.rdf

We navigate the Semantic Web via links in the physical layer of RDF documents and also via links in the logical layer defined by the semantics of RDF and Semantic web - Computer Engineering Dept. - Spring 2007 OWL. 68 Rank has its privilege Google introduced a new approach to ranking query results using a simple popularity metric. It was a big improvement! Swoogle ranks its query results also When searching for an ontology, class or property, wouldnt one want to see the most used ones first? Ranking SW content requires different

algorithms for different kinds of SW objects For SWDs, SWTs, individuals, assertions, molecules, etc Semantic web - Computer Engineering Dept. - Spring 2007 69 Ranking SWDs For offline ranking it is possible to use the references idea of PageRank. In OntoRank values for each ontology is calculated very similar to PageRank in traditional search engines like google Ranking based on Referencing identify and rank of referrer Number of citation by others Distance of reference from origin to target Types of links:

Import Extend Instantiate Prior version .. Semantic web - Computer Engineering Dept. - Spring 2007 70 An Example http://www.w3.org/2000/01/rdf-schema wPR wPR=300 =300 OntoRank

OntoRank=403 =403 TM TM http://xmlns.com/wordnet/1.6/ wPR wPR=3 =3 OntoRank OntoRank=103 =103 EX http://xmlns.com/foaf/1.0/ TM wPR wPR=100

=100 OntoRank OntoRank=100 =100 http://www.cs.umbc.edu/~finin/foaf.rdf wPR wPR=0.2 =0.2 OntoRank OntoRank=0.2 =0.2 Semantic web - Computer Engineering Dept. - Spring 2007 71 Crawler Based Ontology Search Engines

Service User interface Services to application systems Semantic web - Computer Engineering Dept. - Spring 2007 72 Demo 1 Find Time Ontology We can use a set of keywords to search ontology. For example, time, before, after are basic concepts for a Time ontology. Semantic web - Computer Engineering Dept. - Spring 2007 73

Demo 2(a) Digest Time Ontology (document view) Semantic web - Computer Engineering Dept. - Spring 2007 74 Summary 2004 Swoogle Swoogle (Mar, (Mar, 2004) 2004) Swoogle2 Swoogle2 (Sep,

(Sep, 2004) 2004) 2005 Automated SWD discovery SWD metadata creation and search Ontology rank (rational surfer model) Swoogle watch Web Interface Ontology dictionary Swoogle statistics Web service interface (WSDL) Bag of URIref IR search Triple shopping cart Better (re-)crawling strategies Better navigation models Index instance data Swoogle3

Swoogle3(July (July2005) 2005) More metadata (ontology mapping and OWL-S services) Better web service interfaces IR component for string Semantic web - Computer Engineering Dept. - Spring 2007 literals 75 Applications and use cases Supporting Semantic Web developers, e.g., Ontology designers Vocabulary discovery Whos using my ontologies or data?

Etc. Searching specialized collections, e.g., Proofs in Inference Web Text Meaning Representations of news stories in SemNews Supporting SW tools, e.g., Discovering mappings between ontologies Semantic web - Computer Engineering Dept. - Spring 2007 76 Semantic Search Engines

There are some restrictions for current search engines One interesting example : Matrix Another example is java Semantic web is introduced to overcome this problem. The most important tool in semantic web for improving search results is context concept and its correspondence with Ontologies. This type of search engines uses such ontological definitions Semantic web - Computer Engineering Dept. - Spring 2007 77 Two Levels of the Semantic Web Deep Semantic Web: Intelligent agents performing inference Semantic Web as distributed AI Small problem the AI problem is not yet solved Shallow Semantic Web: using SW/Knowledge Representation

techniques for Data integration Search Is starting to see traction in industry Semantic web - Computer Engineering Dept. - Spring 2007 78 Problems with current search engines Current search engines = keywords: high recall, low precision sensitive to vocabulary insensitive to implicit content Semantic web - Computer Engineering Dept. - Spring 2007 79 Semantic Search Engines

It is possible to categorize this type of search engines to three groups. Context Based Search Engines They are the largest one, aim is to add semantic operations for better results. Evolutionary Search Engines Use facilities of semantic web to accumulate information on a topic we are researching on. Semantic Association Discovery Engines They try to find semantic relations between two or more terms. Semantic web - Computer Engineering Dept. - Spring 2007 80 Context Based Search Engines Semantic web - Computer Engineering Dept. - Spring 2007

81 Context Based Search Engines 1) Crawling the semantic web: There is not much difference between these crawlers and ordinary web crawlers many of the implemented systems uses an existing web crawler as underlying system. Its better to develop a crawler that understands special semantic tags. One of the important features of theses crawlers should be the exploration of ontologies that are referred from existing web pages Semantic web - Computer Engineering Dept. - Spring 2007 82 Annotation Methods

Annotation is perquisite of Search in semantic web. There are different approaches which spawn in a broad spectrum from complete manual to full automatic methods. Selection of an appropriate method depends on the domain of interest In general meta-data generation for structured data is simpler Semantic web - Computer Engineering Dept. - Spring 2007 83 Annotation Methods Annotations can be categorized based on following aspects: Type of meta-data

Structural : non contextual information about content is expressed (e.g. language and format) Semantic: The main concern is on the detailed content of information and usually is stored as RDF triples Semantic web - Computer Engineering Dept. - Spring 2007 84 Annotation Methods Generation approach A simple approach is to generate meta-data without considering the overall theme of the page. (Without Ontology) Better approach is to use an ontology in the generation process. Using a previously specified ontology for that type, generate meta-data that instantiates concepts and relations of ontology for that page

The main advantage of this method is the usage of contextual information. Semantic web - Computer Engineering Dept. - Spring 2007 85 Annotation Methods Source of generation The ordinary source of meta-data generation is a page itself Sometimes it is beneficial to use other complementary sources, like using network available resources for accumulating more information for a page For example for a movie it might be possible to use IMDB to extract additional information like director, genre, etc. Semantic web - Computer Engineering Dept. - Spring 2007 86

Evolutionary Search Engines The advanced type of search is some thing like research Here we aim at gathering some information about specific topic It can be something like search by Teoma search engine For example if we give the name of a singer to the search engine it should be able to find some related data to this singer like biography, posters, albums and so on. Semantic web - Computer Engineering Dept. - Spring 2007 87 Evolutionary Search Engines These engines usually use on of the commercial search engines as their base component for searching and they augment returned result by these base engines. This augmented information is gathered from some datainsensitive web resources.

Semantic web - Computer Engineering Dept. - Spring 2007 88 Evolutionary Search Engines Architecture Semantic web - Computer Engineering Dept. - Spring 2007 89 Evolutionary Search Engines It has some similarities with previous categorys architecture Here we crawl and generate annotation just for some well know informational web pages i.e. CDNow, Amazon, IMDB After this phase we collect annotations in a repository. Semantic web - Computer Engineering Dept. - Spring 2007

90 Evolutionary Search Engines Whenever a sample user posed a query two processes must be performed: first, we should give this query to a usual search engine (usually Google) to obtaining raw results. Second, system will attempt to detect the context and its corresponding ontology for the users request in order to extract some key concepts. Later we use these concepts to fetch some information from our metadata repository. The last step in this architecture is combining and displaying results. Semantic web - Computer Engineering Dept. - Spring 2007 91 Evolutionary Search Engines Main problems and challenge in these types of

engines are : Concept extraction from users request Selecting proper annotation to show and their order Semantic web - Computer Engineering Dept. - Spring 2007 92 Evolutionary Search Engines Concept extraction from users request there are some problems that lead to misunderstanding of input query by system; Inherent ambiguity in query specified by user

Complex terms that must be decomposed to understand . Semantic web - Computer Engineering Dept. - Spring 2007 93 Evolutionary Search Engines Selecting proper annotation to show and their order: often we find a huge number of potential metadata related to the initial request and we should choose those ones that are more useful for user. A simple approach is using other concepts around our core concept (which we extracted it before) in base ontology if we have more than one core concept we must focus on those concepts that are on the path

between these concepts. Semantic web - Computer Engineering Dept. - Spring 2007 94 Displaying the Results Results are displayed using a set of templates Each class of object has an associated set of templates The templates specify the class and the properties and a HTML template A template is identified for each node in the ordered list and the HTML is generated The HTML is included in the results page Semantic web - Computer Engineering Dept. - Spring 2007 95 W3C Search

W3C Semantic Search has five different data sources: People, Activities, Working Groups, Documents, and News Both ABS and W3C Semantic Search have a basic ontology about people, places, events, organizations, vocabulary terms, etc. The plan is to augment a traditional search with data from the Semantic Web Semantic web - Computer Engineering Dept. - Spring 2007 96 Base Ontology A segment of the Semantic Web pertaining to Eric Miller Semantic web - Computer Engineering Dept. - Spring 2007

97 Sample Applications-W3C Search Semantic web - Computer Engineering Dept. - Spring 2007 98 Activity Based Search ABS contains data from many sites, such as AllMusic, Ebay, Amazon, AOL Shopping, TicketMaster, Weather.com and Mapquest There are millions of triples in the ABS Semantic Web TAP knowledge base has a broad range of domains including people, places, organizations, and products Resources have a rdf:type and rdfs:label Semantic web - Computer Engineering Dept. - Spring 2007

99 Sample Applications-ABS Semantic web - Computer Engineering Dept. - Spring 2007 100 Sample Applications-ABS Semantic web - Computer Engineering Dept. - Spring 2007 101 References

T. Finin, J. Mayfield, C. Fink, A. Joshi, and R. S. Cost, Information retrieval and the semantic web, in Proceedings of the 38th International Conference on System Sciences, Hawaii, United States of America, 2005. T. Finin, L. Ding, R. Pan, A. Joshi, P. Kolari, A. Java, and Y. Peng, Swoogle: Searching for knowledge on the semantic web, in Proceedings of the AAAI 05, 2005. R. Guha, R. McCool, and E. Miller, Semantic search, in Proc. of the12th international conference on World Wide Web, New Orleans, 2003, pp. 700709. Y. Zhang, W. Vasconcelos, and D. Sleeman, OntoSearch: An ontology search engine, in The Twenty-fourth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, Cambridge, 2004. Semantic web - Computer Engineering Dept. - Spring 2007

102

Recently Viewed Presentations

  • Applying Motivational Interviewing to Job Corps Students and

    Applying Motivational Interviewing to Job Corps Students and

    Develop Discrepancy. problem. Positives things about making the change. Positives things about keeping things the way they are. Negative things about making the change. Negative things about keeping things the way they are. Have audience give an example of a...
  • English 126 - Green River College

    English 126 - Green River College

    It is said that the characters are what make a story come alive - not the plot, theme, imagery, or the twists. It is the characters that we remember long after we have read a story that speaks to us....
  • Indoor air pollution and vulnerability to bacterial pneumonia

    Indoor air pollution and vulnerability to bacterial pneumonia

    TBC: Fan stove and/or LPG. TBC. Pilot studies. Thank you! Trends in SFU: 1980 - 2010. Exposure distributions in plancha and open fire groups. This is important for two reasons: Analysis (ITT) What it tells us about the impact of...
  • Chapter 1 The Economic Problem - Mr. Farshtey

    Chapter 1 The Economic Problem - Mr. Farshtey

    Needs - the essentials of life, such as food and shelter Wants - desires for non-essential items Economic Problem - the problem of having unlimited wants, but limited resources to satisfy them Scarcity - the limited nature of resources, which...
  • Community Service and Enrichment

    Community Service and Enrichment

    Walton Support and Enrichment Year 11-12 Induction Why do we Offer Walton Support and Enrichment? Opportunity to enhance their UCAS application or C.V. Excellent experience in a wide variety of settings Gain leadership and organisation skills What's the Difference Between...
  • Its Not About The Answer; Its More About

    Its Not About The Answer; Its More About

    PruFund Protected Growth. PruFund . Cautious. PruFund Protected Cautious. DIY - retain all investment responsibility and construct own portfolios. Outsource . some. process using "model portfolios" - investment responsibility retained, but use of 3rd party fund selection.
  • Approximations and Round-Off Errors Chapter 3

    Approximations and Round-Off Errors Chapter 3

    Chapter 3 Approximations and Round-Off Errors Chapter 3 For many engineering problems, we cannot obtain analytical solutions. Numerical methods yield approximate results, results that are close to the exact analytical solution.
  • Poetry: Versification: The Principles and Practice of Writing ...

    Poetry: Versification: The Principles and Practice of Writing ...

    The Principles and Practice of Writing Verse Poetry Unit English 1A 2010 What is a poem? Take 2 minutes and jot down all the elements that a work must include to be considered a poem? What is the length? What...