Tag Archives: MongoDBWorld

#MongoDBWorld Innovation Awards

Analytics: eBay, Genentech

Big Apple for New York Companies: SumAll – aggregating data at scale on MongoDB on ObjectRocket at Rackspace

Cool Data: UK Met Office Space Weather Project

Data Science: eHarmony generating more than 3B potential matches per day. MongoDB helped reduce time to match to minutes.

Gaming: EA runs FIFA on MongoDB

Education: LinkedIn and their internal LearnIn platform (see my earlier post)

Internet of Things: Bosch

MongoDB + Hadoop: United Health Group – Optum Insight (see my earlier post

Open Source: 3D Repo

Scale: Adobe, Lockheed Martin

Startup: Twine (Health)

Tools: JSON Studio, Meteor

[tag health cloud BigData MongoDB MongoDBWorld NoSQL]

Mark Scrimshire
Health & Cloud Technology Consultant

Mark is available for challenging assignments at the intersection of Health and Technology using Big Data, Mobile and Cloud Technologies. If you need help to move, or create, your health applications in the cloud let’s talk.
Blog: http://blog.ekivemark.com
email: mark@ekivemark.com
Stay up-to-date: Twitter @ekivemark
Disclosure: I began as a Patient Engagement Advisor and am now CTO to Personiform, Inc. and their Medyear.com platform. Medyear is a powerful free tool that helps you collect, organize and securely share health information, however you want. Manage your own health records today. Medyear: The Power Grid for your Health.

#MongoDBWorld IBM / Cloudant – Adam K

http://world.mongodb.com/content/keynote-dr-angel-luis-diaz-ibm

Remarks from Dr. Angel Luis Diaz, VP Open Technology and Cloud Performance Solutions, IBM.

What is IBM doing to push innovation.

Cloudant has been committing to CouchDB and MongoDB.

IBM BlueMix – Platform as a Service built on Cloud Foundry.

It is fascinating to see the evolution of IBM. No longer hardware. Services are the future.

Cloudant JSON doc store delivered as DB as a Service (DBaaS).
Cloudant Query – Implements MongoDB style DB find.

Cloudant Query is taking MongoDB style syntax and delivering via http.

MongoDB style querying is becoming a de facto standard.

More common interfaces can help NoSQL like SQL did when relational databases where introduced.

#MongoDBworld CharityMajors (@mipsytipsy) @Parse closing keynote

Charity Majors @PArse / Facebook

http://world.mongodb.com/content/keynote-charity-majors-parsefacebook

Parse hands the backend – Push notifications, Analytics and a ton of other server-side services and deliver at scale.

Parse has 270,000 mobile apps running on Parse – all hosted on MongoDB.

All Software is a pain!

MongoDB + Ops

Reliability

Reliability – MongoDB is not immune to crashes. The key to resiliency is the Replica Set. You only have to be concerned about the service.

Horizontally scalable services means no Pets. You are dealing with a herd. Cattle not pets. No hand crafted server pets because pets will always die.

Charity – A Battlestar Geek – Your life is without meaning without BG!

Design for High Availability from the outset.

You can’t design in high availability AFTER the fact.

Ops people hate software because they have to plan for failure. It is going to happen.

Flexibility

When you change a schema EVERYTHING break! Why do you want a schema????

Data model Flexibility is critical

Workload flexibility is also critical to flexibility.

When you have hundreds of thousands of apps you have everything. You can’t optimize for a specific load.

Every App must be performant AND must be able to scale.

ONE re-usable solution is better than multiple platforms optimized for specific systems. Engineering workload is a limiting factor.

Choose ONE SINGLE Reusable solution.

Automation

Make repetitive annoying tasks made easy.

Scalability is about more than handling tasks really quickly.

The replica set allows you to take out nodes and work on them.

Parse is dealing with 100’s of Terabytes every month.

MongoDB works for Parse:
– Flexible
– Resilient
– Automation friendly

Automation needs operations best practices to be shared. Operations is still young.
Parse has published open source tools

Parse launched these tools today:
– Mongo Proxy github.com/facebookgo/dvara

Allows the replay of workload profiles. Replay in line with original snapshots, or as fast as possible.

Both tools are written in GO.

#MongoDBWorld Genomics and the Connectivity Map (A presentation from the Broad Institute)

More from #MongoDBWorld.

Presentation by the Broad Institute:

# MongoDB and the Connectivity Map: Making Connections Between Genetics and Disease

“The Broad Institute has developed a novel high-throughput gene-expression profiling technology and has used it to build an open-source catalog of over a million profiles that captures the functional states of cells when treated with drugs and other types of perturbations. Referred to as the Connectivity Map (or CMap), these data when paired with pattern matching algorithms, facilitate the discovery of connections between drugs, genes and diseases. We wished to expose this resource to scientists around the world via an API that is easily accessible to programmers and biologists alike. We required a database solution that could handle a variety of data types and handle frequent changes to the schema. We realized that a relational database did not fit our needs, and gravitated towards MongoDB for its ease of use, support for dynamic schema, complex data structures and expressive query syntax. In this talk, we’ll walk through how we built the CMap library. We’ll discuss why we chose MongoDB, the various schema design iterations and tradeoffs we’ve made, how people are using the API, and what we’re planning for the next generation of biomedical data.”

https://world.mongodb.com/mongodb-world/session/mongodb-and-connectivity-map-making-connections-between-genetics-and-disease

The Connectivity Map began as a pilot project in 2006.

7,000 experiments
19,000 registered users
1,200 Scientific Reports

One Gene expression signature is expensive – thousands of dollars.

As cost drops the number of experiments can increase.

This has grown to 1.5 million experiments.

MongoDB came in to play because they didn’t know what the data structures needed to be.

CMap LINCS Dataset has built a library of 1.4M gene expression profiles.
12,488 compounds,

The Connectivity Map is easy to describe but difficult to model.

1.4 M profiles times 22,000 geners yields 30B data points.

This is further complicated by the diversity of use cases and users.

Annotation is complex and may be partial. The data is also frequently updated.

The Agile approach:
– Store just what’s needed
– Test and use daily
– Refactor frequently

The initial data model was simply an inventory of signatures.

4-5 fields in a json data packet.
This evolved from a simple signature_info block to cell_info and Treatment_info.

They then added computed fields and external meta-data which were added to Singature_info and Cell_info. This is easy to do in MongoDB.

APIs are awesome! Life Sciences need more of them.

functionality in the API overcame convention. So used the ?siginfo?q={“cell”:”A”} style rather than folder convention /siginfo/cell/A

Node.js and Mongoose (as noted in the earlier LinkedIn session) came in to play for easy API creation.

Compute API running on AWS performs message queuing via a capped collection.

HDF5 (Hierarchical Data Format) complements MongoDB for numerical analysis

GCTX is a binary format based on HDF5, cross platform with multiple language bindings.

Broad’s platform is Lincscloud – targeted to researchers: Lincsloud.org
This is free for academic use.

Uses of Broad’s tools:

  • Predicting Drug Function
  • Drug Re-purposing (failed drugs – new uses)
    i.e. Phase 2 trials are where results don’t live up to expectations but DRUG IS SAFE!

So can drug be re-mapped to new targets.
– Pushing from single patient application to two patients and on to population applications.

[tag health cloud BigData MongoDB MongoDBWorld NoSQL]

Mark Scrimshire
Health & Cloud Technology Consultant

Mark is available for challenging assignments at the intersection of Health and Technology using Big Data, Mobile and Cloud Technologies. If you need help to move, or create, your health applications in the cloud let’s talk.
Blog: http://blog.ekivemark.com
email: mark
Stay up-to-date: Twitter @ekivemark
Disclosure: I began as a Patient Engagement Advisor and am now CTO to Personiform, Inc. and their Medyear.com platform. Medyear is a powerful free tool that helps you collect, organize and securely share health information, however you want. Manage your own health records today. Medyear: The Power Grid for your Health.

#MongoDBWorld Hidden gems in the new 2.6 version of @mongoDB

More from #MongoDBWorld.

Hidden Gems in the 2.6 Release

Everyone using MongoDB is familiar with the big features of the 2.6 release (and if you’re not, here’s a link) — text search, $out, user-defined roles, X509 authentication, etc. But what about the little guys? Our VP of Engineering, Daniel Pasette, will take you on a tour of five small but mighty features from the 2.6 release that make your MongoDB experience more productive.

Dan Pasette

VP of Core Engineering at MongoDB

Dan is the VP of Core Engineering at MongoDB. Prior to joining MongoDB, Dan was a Development Manager at LimeWire where he led a team working on content ingestion for an (unreleased) digital music service called Grapevine. Past employment includes MTV Networks, Sonicnet, iXL, and Electronic Book Technologies. Dan holds a degree in Computer Science from Brown University.

http://world.mongodb.com/mongodb-world/session/hidden-gems-26-release

The Technical sessions are packed. I was hoping to look at Memory Management but the room was full to overflowing. So I dropped in to the session on the latest release of MongoDB – Version 2.6.

Power of 2 – Now default allocation Strategy

Power of 2 feature allows extra space when saving records. It is on by default in the latest release. It is best suited to uses that have re-writes to databases. What typically happens is a re-write expands the file and the file wouldn’t fit in the existing space. The extra space enabled by Power of 2 makes it more likely that records can be written back to the blocks they came from.

By adding space to records it reduces the amount of data movement because as data grows inside records the records still fit.

Server Side Timeouts

An example, a collection was indexed in staging but forgotten in production. This can cause table scans that cause users to re-try or re-scan. This creates socket timeouts. This can impact other users on the system. The new feature is maxTimeMS. This allows you to set a maximum time for how long an operation can run in the database. Set from milliseconds to minutes depending on the operation.

Query Engine Introspection

This works in conjunction with MaxTimeMS. It allows you to delve in to queries to resolve problems. The Query execution framework was completely re-writtin in 2.6. Prior to 2.6 the query path etc was opaque to users. This changed in 2.6.

The Query Planner chooses the best index for a given query.

Query Parser sends to Query Planner. This is passed to the Plan Cache. which passes to the Plan Runner.

The Plan Enumerator passes all the plans to the Multiplan router. This runs these plans for a limited amount of time and then chooses the most efficient.

On subsequent execution of the same query the query goes straight to the Plan Cache.

If the plan caches a sub-optimal plan.
Plans are dropped after indexing and other major changes.

getPlanCache

A set of Plan Cache tools to view and manipulate the cache.

Background indexing on Secondaries

This has existed but the feature has been rounded out.

Pre-2.6 background index builds became foreground index builds when replicated to secondaries.

In 2.6 keeps background indexing in the background.
Note: Background indexing isn’t as fast and is less tightly packed.

User Driven Enhancements

All of these features came about as a result of user feedback that go through jira.mongodb.com

Limits on Replica sets

Limit of 12 nodes in a replica set with 7 voting members

[tag cloud BigData MongoDBWorld

<

div style=”color: rgb(0, 0, 0); font-family: Arial; font-size: medium;”>

Mark Scrimshire
Health & Cloud Technology Consultant

Mark is available for challenging assignments at the intersection of Health and Technology using Big Data, Mobile and Cloud Technologies. If you need help to move, or create, your health applications in the cloud let’s talk.
Blog: http://blog.ekivemark.com
email: mark@ekivemark.com
Stay up-to-date: Twitter @ekivemark
Disclosure: I began as a Patient Engagement Advisor and am now CTO to Personiform, Inc. and their Medyear.com platform. Medyear is a powerful free tool that helps you collect, organize and securely share health information, however you want. Manage your own health records today. Medyear: The Power Grid for your Health.