This is my third session looking at the use of MongoDB in a health setting. See my earlier posts from today:
The Genentech session information:
The Best of Both Worlds: Speeding Up Drug Research with MongoDB & Oracle (Genentech)
Genentech Research and Early Development (gRED) develops drugs for significant unmet medical needs. Key to this effort is providing Investigators with new genetic strains of animals needed to understand disease causes and test new drugs. While these genetic strains have increased greatly in complexity, technology improvements have increased accuracy and throughput while reducing the cost of genetic testing. This has led to an effort to redevelop the Genetic Analysis Lab system to reduce the time needed to introduce new lab instruments from months to weeks or even days. Important to this initiative has been the introduction of MongoDB to capture the variety of data generated by genetic tests and integrate it with the existing Oracle RDBMS environment. Not only has it proved fairly easy to integrate the two, but we have been able to take advantage of the strengths of MongoDB to provide a flexible schema and Oracle to provide transaction management and integration with the existing information system.
Software Engineer in Research at Genentech
Doug Garrett, Software Engineer in Research at Genentech, has been developing software for over 20 years. Most recently he has worked on developing systems to support the processes and lab instruments needed for the development of genetic murine models needed for Genentech disease and drug research. Before Genentech, Doug worked for a number of different companies including Nokia, McKesson and Kaiser Medical. Doug holds a B.A. in Physics from Occidental College and an M.S. in Computer Information Systems from Boston University.
Genentech – Speeding Drug Research
The challenge was to integrate MongoDB with Oracle Relational Databases.
BioInformatics is different from IT.
- The flexibility of the schema is a big benefit
- It can also easily integrate with traditional RDBMS
- Saving time is critical when you are dealing with saving human life.
Every new lab instrument drove a change to the Oracle RDBMS schema. This created a time lag and slowed genetic testing.
The Development process
- What is a disease cause? Is it genetic
- Develop new mouse model
- Does it create a new drug.I s it safe and effective
- Then move to clinical trials
With Oracle and their schema it took 6 months to modify the schema to add a new genetic test. The follow on additional test took a further 3 months. A more flexible solution was needed that didn’t add to the complexity of the database.
This led to the selection of MongoDB.
1 million rows in Oracle became 4,000 documents in MongoDB.
The ingestion process is where the configuration for an instrument is focused.
There is then a generic data loader. The schema complexity is in the mongo document and maintained in one place.
A java program presents users with a single window. This combines the Oracle record view with the content of the relevant MongoDB document(s).
The DB Schema is now immune from introduction of new instruments.
Issues – A Disaster Recovery copy
Oracle replication is challenging. It is not a built in function
With MongoDB a replica was implemented in the DR site within a couple of hours.
MongoDB Aggregation Framework change in Release 2.6 the 16MB limit on result sets was removed.
Other uses for MongoDB
Import from CSV, JSON, XML and other sources.
MongoDB as a data import service great for building data pipelines.
[tag health cloud BigData MongoDB MongoDBWorld NoSQL]
Health & Cloud Technology Consultant
Mark is available for challenging assignments at the intersection of Health and Technology using Big Data, Mobile and Cloud Technologies. If you need help to move, or create, your health applications in the cloud let’s talk.
Stay up-to-date: Twitter @ekivemark
Disclosure: I began as a Patient Engagement Advisor and am now CTO to Personiform, Inc. and their Medyear.com platform. Medyear is a powerful free tool that helps you collect, organize and securely share health information, however you want. Manage your own health records today. Medyear: The Power Grid for your Health.