
Perhaps they didn't want people using standards-based technology to automate the process of finding out that BestBuy's external hard drives usually cost less, or at least did in mid-October. I still see microdata in the source of a page like this Walmart page for an external hard drive, but I guess it's arranged differently. Unfortunately, since I pulled the data that I was working with on October 15th, Walmart seems to have changed their web pages so that the W3C Microdata to RDF Distiller doesn't find the data in them anymore. Vendors other than Walmart and BestBuy on the list were included in the Walmart data.


WD My Book 4TB USB 3.0 External Hard Drive WD - My Book 4TB External USB 3.0 Hard Drive - Black WD My Book 3TB USB 3.0 External Hard Drive WD - My Book 3TB External USB 3.0 Hard Drive - Black
Walmart wd my book 3tb portable#
Seagate Backup Plus 1TB Slim Portable External Hard Drive, Black Seagate - Backup Plus Slim 1TB External USB 3.0/2.0 Portable Hard Drive - Black

Toshiba Canvio Basics 2TB USB 3.0 External Hard Drive Toshiba 1TB Canvio Basics USB 3.0 External Hard Drive Toshiba - Canvio Basics 1 TB External Hard Drive Product Nameīuffalo - DriveStation Axis Velocity 2TB External USB 3.0/2.0 Hard Driveīuffalo Technology DriveStation Axis Velocity 2TB USB 3.0 External Hard Drive with Hardware Encryption, Black # In case there's a level of indirection for seller nameīIND(str(coalesce(?sellerSchemaName,?seller)) AS ?sellerName )Įach comment in the query describes how it accounts for some difference between the Walmart microdata and the BestBuy microdata-for example, the BestBuy data included a dollar sign with prices, but the Walmart data did not.Īfter running the query, requesting XML output, and then running a little XSLT on that output, I ended up with the table shown below. SELECT ?productName ?modelNumber ?price ?sellerNameīIND(str(?productNameVal) AS ?productName)īIND(str(?modelNumberVal) AS ?modelNumber)īIND(xsd:decimal(replace(?priceVal,"\\$","")) AS ?price) The various queries that I wrote led up to this one, which lists all the products by model number and price for easy comparison: Because of some slight differences in how they treated certain bits of data, I was tempted to clean up the aggregated data before querying it, but I really wanted to write queries that would work on the data in its native form, so I put the cleanup steps right in the queries.

You can see a Turtle file of aggregated Walmart plus Bestbuy data here. If I was going to pursue this further I would enroll in BestBuy's Developer Program as well.) After using the Distiller form to do this several times, I downloaded its Python script from the pymicrodata github page and found it easy to run locally. I did sign up for Walmart's API program, which was easy to try out, but the part of the API that lets you query products by category is "restricted, and is available on a request basis" according to their Data Feed API home page, so I didn't bother. (Instead of pulling it separately from the twelve individual web pages, it would have been nice to automate this a bit more. I extracted the data describing six external USB drives from both and, limiting myself to models that were available on both websites. With major retailers such as Walmart and BestBuy making such data available on-as far as I can tell-every single product's web page, this makes some interesting queries possible to compare prices and other information from the two vendors. I've been learning more about microdata recently, but even before I did, I found that the W3C's Microdata to RDF Distiller written by Ivan Herman would convert microdata stored in web pages into RDF triples, making it possible to query this data with SPARQL. The combination of microdata and seems to have hit a sweet spot that has helped both to get a lot of traction.
