最新文章专题视频专题问答1问答10问答100问答1000问答2000关键字专题1关键字专题50关键字专题500关键字专题1500TAG最新视频文章推荐1 推荐3 推荐5 推荐7 推荐9 推荐11 推荐13 推荐15 推荐17 推荐19 推荐21 推荐23 推荐25 推荐27 推荐29 推荐31 推荐33 推荐35 推荐37视频文章20视频文章30视频文章40视频文章50视频文章60 视频文章70视频文章80视频文章90视频文章100视频文章120视频文章140 视频2关键字专题关键字专题tag2tag3文章专题文章专题2文章索引1文章索引2文章索引3文章索引4文章索引5123456789101112131415文章专题3
问答文章1 问答文章501 问答文章1001 问答文章1501 问答文章2001 问答文章2501 问答文章3001 问答文章3501 问答文章4001 问答文章4501 问答文章5001 问答文章5501 问答文章6001 问答文章6501 问答文章7001 问答文章7501 问答文章8001 问答文章8501 问答文章9001 问答文章9501
当前位置: 首页 - 科技 - 知识百科 - 正文

TheMostPopularPubNames

来源:懂视网 责编:小采 时间:2020-11-09 13:21:08
文档

TheMostPopularPubNames

TheMostPopularPubNames:By Ross Lawley, MongoEngine maintainer and Scala Engineer at 10gen Earlier in the year I gave a talk at MongoDB London about the different aggregation options with MongoDB. The topic recently came up again in conversation at a user group,
推荐度:
导读TheMostPopularPubNames:By Ross Lawley, MongoEngine maintainer and Scala Engineer at 10gen Earlier in the year I gave a talk at MongoDB London about the different aggregation options with MongoDB. The topic recently came up again in conversation at a user group,

By Ross Lawley, MongoEngine maintainer and Scala Engineer at 10gen Earlier in the year I gave a talk at MongoDB London about the different aggregation options with MongoDB. The topic recently came up again in conversation at a user group,

By Ross Lawley, MongoEngine maintainer and Scala Engineer at 10gen

Earlier in the year I gave a talk at MongoDB London about the different aggregation options with MongoDB. The topic recently came up again in conversation at a user group, so I thought it deserved a blog post.

Gathering ideas for the talk

I wanted to give a more interesting aggregation talk than the standard “counting words in text”, and as the aggregation framework gained shiny 2dsphere geo support in 2.4, I figured I’d use that. I just needed a topic…

What is top of mind for us Brits?

Two things immediately sprang to mind: weather and beer.

I opted to focus on something close to my heart: beer :) But what to aggregate about beer? Then I remembered an old pub quiz favourite…

What is the most popular pub name in the UK?

I know there is some great open data, including a wealth of information on pubs available from the awesome open street map project. I just need to get at it and happily the Overpass-api provides a simple “xapi” interface for OSM data. All I needed was anything tagged with amenity=pub within in the bounds of the UK and with their xapi interface this is as simple as a wget:

http://www.overpass-api.de/api/xapi?*[amenity=pub][bbox=-10.5,49.78,1.78,59]

Once I had an osm file I used the imposm python library to parse the xml and then convert it to following GeoJSON format:

{
 "_id" : 451152,
 "amenity" : "pub",
 "name" : "The Dignity",
 "addr:housenumber" : "363",
 "addr:street" : "Regents Park Road",
 "addr:city" : "London",
 "addr:postcode" : "N3 1DH",
 "toilets" : "yes",
 "toilets:access" : "customers",
 "location" : {
 "type" : "Point",
 "coordinates" : [-0.1945732, 51.6008172]
 }
}

Then it was a case of simply inserting it as a document into MongoDB. I quickly noticed that the data needed a little cleaning, as I was seeing duplicate pub names, for example: “The Red Lion” and “Red Lion”. Because I wanted to make a wordle I normalised all the pub names.

If you want to know more about the importing process, the full loading code is available on github: osm2mongo.py

Top pub names

It turns out finding the most popular pub names is very simple with the aggregation framework. Just group by the name and then sum up all the occurrences. To get the top five most popular pub names we sort by the summed value and then limit to 5:

db.pubs.aggregate([
 {"$group":
 {"_id": "$name",
 "value": {"$sum": 1}
 }
 },
 {"$sort": {"value": -1}},
 {"$limit": 5}
]);

For the whole of the UK this returns:

  1. The Red Lion
  2. The Royal Oak
  3. The Crown
  4. The White Hart
  5. The White Horse

image

Top pub names near you

At MongoDB London I thought that was too easy, so filtered to find the top pub names near the conference and showing off some of the geo functionality that became available in MongoDB 2.4. To limit the result set match and ensure the location is within a 2 mile radius by using $centreSphere. Just provide the coordinates [ , ] and a radius of roughly 2 miles (3959 is approximately the radius of the earth, so divide it by 2):

db.pubs.aggregate([
 { "$match" : { "location":
 { "$within":
 { "$centerSphere": [[-0.12, 51.516], 2 / 3959] }}}
 },
 { "$group" :
 { "_id" : "$name",
 "value" : { "$sum" : 1 } }
 },
 { "$sort" : { "value" : -1 } },
 { "$limit" : 5 }
 ]);

What about where I live?

At the conference I looked the most popular pub name near the conference. Thats great if you happen to live in the centre of London but what about everyone else in the UK? So for this blog post I decided to update the demo code and make it dynamic based on where you live.

See: pubnames.rosslawley.co.uk

Apologies for those outside the UK - the demo app doesn’t have data for the whole world - its surely possible to do.

Cheers

All the code is available in my repo on github including the bson file of the pubs and the wordle code - so fork it and start playing with MongoDB’s great geo features!

声明:本网页内容旨在传播知识,若有侵权等问题请及时与本网联系,我们将在第一时间删除处理。TEL:177 7030 7066 E-MAIL:11247931@qq.com

文档

TheMostPopularPubNames

TheMostPopularPubNames:By Ross Lawley, MongoEngine maintainer and Scala Engineer at 10gen Earlier in the year I gave a talk at MongoDB London about the different aggregation options with MongoDB. The topic recently came up again in conversation at a user group,
推荐度:
标签: the name names
  • 热门焦点

最新推荐

猜你喜欢

热门推荐

专题
Top