Using Recipe Data to Improve Online Matchmaking 331

Một phần của tài liệu Mining the social web, 2nd edition (Trang 357 - 362)

Part I. A Guided Tour of the Social Web Prelude

8. Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More

8.2. Microformats: Easy-to-Implement Metadata 322

8.2.2. Using Recipe Data to Improve Online Matchmaking 331

Since Google’s rich snippets initiative took off, there’s been an ever-increasing awareness of microformats, and many of the most popular foodie websites have made solid pro‐

gress in exposing recipes and reviews with hRecipe and hReview. Consider the potential for a fictitious online dating service that crawls blogs and other social hubs, attempting to pair people together for dinner dates. One could reasonably expect that having access to enough geo and hRecipe information linked to specific people would make a pro‐

found difference in the “success rate” of first dates.

People could be paired according to two criteria: how close they live to each other and what kinds of foods they eat. For example, you might expect a dinner date between two individuals who prefer to cook vegetarian meals with organic ingredients to go a lot better than a date between a BBQ lover and a vegan. Dining preferences and whether specific types of allergens or organic ingredients are used could be useful clues to power the right business idea. While we won’t be trying to launch a new online dating service, we’ll get the ball rolling in case you decide to take this idea and move forward with it.

About.com is one of the more prevalent online sites that’s really embracing microformat initiatives for the betterment of the entire Web, exposing recipe information in the hRecipe microformat and using the hReview microformat for reviews of the recipes;

epicurious and many other popular sites have followed suit, due to the benefits afforded by Schema.org initiatives that take advantage of this information for web searches. This section briefly demonstrates how search engines (or you) might parse out the structured data from recipes and reviews contained in web pages for indexing or analyzing. An adaptation of Example 8-1 that parses out hRecipe-formatted data is shown in Example 8-3.

Although the spec is well defined, microformat implementations may vary subtly. Consider the following code samples that parse web pa‐

ges more of a starting template than a robust, full-spec parser. A mi‐

croformats parser implemented in Node.js, however, emerged on Git‐

Hub in early 2013 and may be worthy of consideration if you are seeking a more robust solution for parsing web pages with microfor‐

mats.

Example 8-3. Extracting hRecipe data from a web page

import sys import requests import json

import BeautifulSoup

# Pass in a URL containing hRecipe...

URL = 'http://britishfood.about.com/od/recipeindex/r/applepie.htm'

8.2. Microformats: Easy-to-Implement Metadata | 331

# Parse out some of the pertinent information for a recipe.

# See http://microformats.org/wiki/hrecipe.

def parse_hrecipe(url):

req = requests.get(URL)

soup = BeautifulSoup.BeautifulSoup(req.text)

hrecipe = soup.find(True, 'hrecipe') if hrecipe and len(hrecipe) > 1:

fn = hrecipe.find(True, 'fn').string

author = hrecipe.find(True, 'author').find(text=True) ingredients = [i.string

for i in hrecipe.findAll(True, 'ingredient') if i.string is not None]

instructions = []

for i in hrecipe.find(True, 'instructions'):

if type(i) == BeautifulSoup.Tag:

s = ''.join(i.findAll(text=True)).strip() elif type(i) == BeautifulSoup.NavigableString:

s = i.string.strip() else:

continue if s != '':

instructions += [s]

return { 'name': fn, 'author': author,

'ingredients': ingredients, 'instructions': instructions, }

else:

return {}

recipe = parse_hrecipe(URL) print json.dumps(recipe, indent=4)

For a sample URL such as a popular apple pie recipe, you should get something like the following (abbreviated) results:

{

"instructions": [ "Method",

"Place the flour, butter and salt into a large clean bowl...",

"The dough can also be made in a food processor by mixing the flour...",

332 | Chapter 8: Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More

"Heat the oven to 425°F/220°C/gas 7.",

"Meanwhile simmer the apples with the lemon juice and water..."

],

"ingredients": [ "Pastry",

"7 oz/200g all purpose/plain flour", "pinch of salt",

"1 stick/ 110g butter, cubed or an equal mix of butter and lard", "2-3 tbsp cold water",

"Filling",

"1 ½ lbs/700g cooking apples, peeled, cored and quartered", "2 tbsp lemon juice",

"½ cup/ 100g sugar", "4 - 6 tbsp cold water", "1 level tsp ground cinnamon ", "¼ stick/25g butter", "Milk to glaze"

],

"name": "\t\t\t\t\t\t\tTraditional Apple Pie Recipe\t\t\t\t\t", "author": "Elaine Lemm"

}

Aside from space and time, food may be the next most fundamental thing that brings people together, and exploring the opportunities for social analysis and data analytics involving people, food, space, and time could really be quite interesting and lucrative.

For example, you might analyze variations of the same recipe to see whether there are any correlations between the appearance or lack of certain ingredients and ratings/

reviews for the recipes. You could then try to use this as the basis for better reaching a particular target audience with recommendations for products and services, or possibly even for prototyping that dating site that hypothesizes that a successful first date might highly correlate with a successful first meal together.

Pull down a few different apple pie recipes to determine which ingredients are common to all recipes and which are less common. Can you correlate the appearance or lack of different ingredients to a particular geographic region? Do British apple pies typically contain ingredients that apple pies cooked in the southeast United States do not, and vice versa? How might you use food preferences and geographic information to pair people?

The next section introduces an additional consideration for constructing an online matchmaking service like the one we’ve discussed.

8.2.2.1. Retrieving recipe reviews

This section concludes our all-too-short survey of microformats by briefly introducing hReview-aggregate, a variation of the hReview microformat that exposes the aggregate rating about something through structured data that’s easily machine parseable.

About.com’s recipes implement hReview-aggregate so that the ratings for recipes can

8.2. Microformats: Easy-to-Implement Metadata | 333

be used to prioritize search results and offer a better experience for users of the site.

Example 8-4 demonstrates how to extract hReview information.

Example 8-4. Parsing hReview-aggregate microformat data for a recipe

import requests import json

from BeautifulSoup import BeautifulSoup

# Pass in a URL that contains hReview-aggregate info...

URL = 'http://britishfood.about.com/od/recipeindex/r/applepie.htm' def parse_hreview_aggregate(url, item_type):

req = requests.get(URL)

soup = BeautifulSoup(req.text)

# Find the hRecipe or whatever other kind of parent item encapsulates # the hReview (a required field).

item_element = soup.find(True, item_type)

item = item_element.find(True, 'item').find(True, 'fn').text

# And now parse out the hReview

hreview = soup.find(True, 'hreview-aggregate')

# Required field

rating = hreview.find(True, 'rating').find(True, 'value-title')['title']

# Optional fields

try:

count = hreview.find(True, 'count').text except AttributeError: # optional

count = None try:

votes = hreview.find(True, 'votes').text except AttributeError: # optional

votes = None try:

summary = hreview.find(True, 'summary').text except AttributeError: # optional

summary = None return {

'item': item, 'rating': rating,

334 | Chapter 8: Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More

'count': count, 'votes': votes, 'summary' : summary }

# Find hReview aggregate information for an hRecipe reviews = parse_hreview_aggregate(URL, 'hrecipe') print json.dumps(reviews, indent=4)

Here are truncated sample results for Example 8-4:

{

"count": "7",

"item": "Traditional Apple Pie Recipe", "votes": null,

"summary": null, "rating": "4"

}

There’s no limit to the innovation that can happen when you combine geeks and food data, as evidenced by the popularity of the much-acclaimed Cooking for Geeks, also from O’Reilly. As the capabilities of food sites evolve to provide additional APIs, so will the innovations that we see in this space. Figure 8-4 displays a screenshot of the underlying HTML source for a sample web page that displays its hReview-aggregate implementa‐

tion for those who might be interested in viewing the source of a page.

Figure 8-4. You can view the source of a web page if you’re interested in seeing the (often) gory details of its microformats implementation

8.2. Microformats: Easy-to-Implement Metadata | 335

Most modern browsers now implement CSS query selectors natively, and you can use document.querySelectorAll to poke around in the developer console for your particular browser to review microfor‐

mats in JavaScript. For example, run document.querySelector All(".hrecipe") to query for any nodes that have the hrecipe class applied, per the specification.

Một phần của tài liệu Mining the social web, 2nd edition (Trang 357 - 362)

Tải bản đầy đủ (PDF)

(448 trang)