Search-Based Applications: the Maturation of Search Gregory Grefenstette Exalead Exalead S.A. 2009 Maturation of Search 2 www.exalead.com/search 8 billion URLS, 2 billion images, 200 million videos Wikipedia, cloud tags also Labs.exalead.com

3 Two ways to find information DATABASES VS SEARCH ENGINES 4 Recent Past

DATABASES Structured Structured Data Data Transaction Transaction Precise Precise All All tuples tuples SQL

SQL Slow Slow SEARCH ENGINES Text Similarity Ranking Intuitive Fast Partial 5 More Recent

SEARCH ENGINES DATABASES Structured Structured Data Data Transaction Transaction ss Precise Precise All

All tuples tuples SQL SQL Slow Slow Text Similarity Ranking

Top-K Top-K Column Column store store Map Map Reduce Reduce Data Data Cube Cube

Connectors Connectors Facets Facets Map Map Reduce Reduce Tables Tables


Search based Application An application which uses a search engine component, but whose final purpose is not searching for a document, but rather a domain-oriented process result Examples: Custom response management Logistic tracking and tracing Contextual Advertising Database reporting after offloading 8 Current situation Databases are the backbone of search in information systems

Data Warehouse BI reports Database Business processes DataMart Front-office users

Search-enabled application Optimized solution for information access Data Warehouse BI reports Database Business processes Search Engine

Front-office users Drawbacks of Using Database Search As a Component Standard Architecture Search Based Architecture

12 How does a Search Based Application work? 14 Database converted to Business Items Stored as structured documents Business items are concrete objects directly understandable by end-users Product, Customer, Purchase order, Technical support call

Each business item becomes a document Straightforward and simple format of the document index allows performance and ease-of-use Search engine can offer rich and powerful query language that allows to make queries as complex and advanced as SQL despite the flat data model Search Engine must support typed fields, intra field scope search, category/facets 15 Database into structured documents Product_ID

Product_Name Manufacturer_Names 123 control switch ACME Inc ; The Control Switch Company; Karl GmbH 124 red warning light

Scope Search Product_ID Product_Name 123 control switch 124 Product_ID

Manufacturer_ID 123 345 123 8574 123 4483

red warning light Manufacturer_ID Manufacturer_NAME 345 ACME Inc. 8574 The Control Switch Company

4483 Karl GmbH Product_ID Product_Name Manufacturer_Names 123 control switch

ACME Inc ; The Control Switch Company; Karl GmbH 124 red warning light but the manufacturer names can still be searched as individual records with scope search "ACME GmbH" does not match the document here)

Hierarchical categories Product_ID 123 Color Red Brand ACME Fragile Y

Multiple kinds of attributes can be mixed in a same category field. The hierarchical tree structure of the categories preserves the differences between attribute types Nb of wheels Wheel type 3

2 Product_ID Country 123 France 123 UK

123 Germany Multi-valued attributes can also be represented by categories. A single category field can be used to store hundreds or thousands of attribute columns. Product_ID Attributes

123 Color/Red ; Brand/ACME ; Fragile/Y ; Nb_wheels/3 ; Wheel_type/2; Country/France ; Country/UK; Country/Germany 124 18 Multi-dimensional facets

19 Multi-dimensional facets Search results facets provide aggregate values computed onthe-fly with the search results list One single search query can return the equivalent of dozens of GROUP BY SQL clauses Numerical values associated with facets (count, score, ) can be used to perform complex computations on the results list Search performance is not affected by the size of the category tree Thousands of attribute types can be represented by categories Facets are dynamically selected by the search results: the displayed

attributes are always consistent with the search query (e.g. color and engine type when searching for a car, screen size and CPU speed when searching for a laptop) 20 CASE STUDY LOGISTICS TRACK & TRACE 21 Gefco overview A subsidiary of French car maker PSA (Peugeot, Citron) Now does most of its business outside of PSA

Logistics operator Carries cars from factories to dealers (road, rail) Carries freight (parcels ; originally spare parts) Supply chain and logistic platform design 3.5B, 10 000 employees, 100 countries The original pain Classical multi-criteria search over Oracle, 2 million rows Poor performance despite 2 years of optimization Minute response times Ask users to do simple queries and preferably at some given hours From forms to a search box

24 25 New application With operational reporting Partner French Post Office 28

Tracing of incidents Real-time system Used as an internal audit tool for the mail Suggestion of addresses for customers Search in file numbers, addresses, names, etc. Case Study: RightMove

31 Rightmove: Reduce Costs and Improve Performance through Database 32 Advantages of Search Based Applications 33 35

Conclusions Search engines mature Structured data, high volume, high speed Search based Applications offer Usage: Search interface familiar to user Performance: Search engine geared to search, eases load on database platform Agility: Original database design untouched, reconfiguring output lightweight 36

