Category Archives: BigData

#MongoDBworld @MikeOlson of Cloudera – A new way to think about data

More from #MongoDBWorld.

Mike Olson – Founder and Chief Strategy Officer – Cloudera @mikeolson

“Machines are much better at generating data than people.”

Data is growing faster than Moore’s Law

The interactions of devices makes this growth exponential.

Classic data management is old, big, centralized and expensive. This is not good for machine data.

Google developed tools for handling machine generated data.
Amazon has promoted the cloud. They are one of the most innovative companies.
The Cloud is fundamental. It is a participant, a generator and a recipient.

Data will be generated in lots of places and will live where it was born.

The Data Center of the future will span firewalls.

Book Recommendations: Small Pieces Loosely Joined: A Unified Theory Of The Web – Dave Weinberger.

Nonzero: The Logic of Human Destiny – Robert Wright

Enterprise Data Hub

The initial idea was one system to rule them all but Cloudera has discovered that this doesn’t work. Systems have to talk to other systems. Systems loosely joined for a non-zero sum game.

Example: Terradata, MongoDB and Cloudera have created an enterprise data fabric that stimulates new value.

Collaboration and Connections will win out.

Disruption

New obliterates old. After the hype they co-exist and are targeted for purpose.

Disruption is not the future. Transformation is the future.
Using Data in ways we hadn’t predicted.

“The future of data is that data is the future. It will infuse how we do everything.”

[tag health cloud BigData MongoDB MongoDBWorld NoSQL]

Mark Scrimshire
Health & Cloud Technology Consultant

Mark is available for challenging assignments at the intersection of Health and Technology using Big Data, Mobile and Cloud Technologies. If you need help to move, or create, your health applications in the cloud let’s talk.
Blog: http://blog.ekivemark.com
email: mark@ekivemark.com
Stay up-to-date: Twitter @ekivemark
Disclosure: I began as a Patient Engagement Advisor and am now CTO to Personiform, Inc. and their Medyear.com platform. Medyear is a powerful free tool that helps you collect, organize and securely share health information, however you want. Manage your own health records today. Medyear: The Power Grid for your Health.

#MongoDBWorld kicks off

I am here at the Sheraton in Times Square New York for MongoDBWorld 2014. I will be blogging from the conference. There are some great topics. I am particularly interested in some of the sessions on Genetics and the NPI data source.

The conference runs for the next two days (June 24-25th) and is a heavily packed agenda. See the full schedule here: http://world.mongodb.com/schedule

There are around 2,000 people here for this event from across the world.

It is going to be a busy day because I also have to fit in meetings for Medyear.

# Max Shireson – Kenyote @mshireson

The Last 44 years of Data was all neat and Tidy

The changing world:

PC to Mobile
Ads to Social

Data is different today. 90% of world’s data is less than 2 years old.
80% of enterprise data is unstructured.
Unstructured data growing at 2x the rate of structured data.

2/3 of Fortune 500 are challenged by unstructured data.

6*% Data Variety
15% Data Volume
17% Other Data

A decade ago Google emerged as one of most valuable companies. Took visible data and organized it. That spawned Big Table and other tools.
Open Source was critical because licensing software for the machines to index the Internet would kill any data model.

There have been 7M downloads of MongoDB
20,000 deployments

Major vendors are supporting MongoDB eg. IBM, Cloudera, SAP.

some examples of MongoDB users

Mailbox – Scaled to millions of users in weeks. Subsequently acquired by Dropbox.

Bosch – The Internet of Things will outnumber people on the Internet. They are building the infrastructure for the car. They don’t know what will be required in terms of data structures. MongoDB gives them the flexibility.

MetLife – Data in 70 different systems – Tried to pull togeth in Relational database. They couldn’t keep up with the change. MongoDB was implemented to integrate these data streams in a call center in 90 days.

City of Chicago – Starting from the CTO’s laptop grew to MongoDB with Hadoop to power emergency communications rooms and more critical projects across the city.

Citi – Building a shop window to deploy computers and clusters quickly.

## The future of MongoDB

Developing features: Concurrency, Storage and Management.

[tag health cloud BigData MongoDB MongoDBWorld NoSQL]

Mark Scrimshire
Health & Cloud Technology Consultant

Mark is available for challenging assignments at the intersection of Health and Technology using Big Data, Mobile and Cloud Technologies. If you need help to move, or create, your health applications in the cloud let’s talk.
Blog: http://blog.ekivemark.com
email: mark@ekivemark.com
Stay up-to-date: Twitter @ekivemark
Disclosure: I began as a Patient Engagement Advisor and am now CTO to Personiform, Inc. and their Medyear.com platform. Medyear is a powerful free tool that helps you collect, organize and securely share health information, however you want. Manage your own health records today. Medyear: The Power Grid for your Health.

In NYC for #MongoDBWorld and for @Medyears meetings

This week I am in New York City to take part in MongoDB World. You can still grab yourself a ticket to the event on Tuesday and Wednesday at the Sheraton in Times Square. Check out the link here: http://www.eventbrite.com/e/mongodb-world-tickets-11255679039?aff=es2&rank=1 You will have to be quick. Ticket sales close at 10am ET today (Monday 23rd).

Today is a series of training sessions at MongoDB HQ in NYC. Unfortunately I am not getting to go to those sessions but I will be blogging from the sessions on Tuesday and Wednesday.

I am also here to take part in a number of meetings for Medyear. There is a lot happening with Medyear. The ability to import BlueButton data and share it securely is attracting a lot of attention. Meanwhile we continue to refine both the iOS App and the web site. Our approach in which we break down your health timeline in to a series of posts that can be discretely and securely shared gives a lot of flexibility and using hashtags for public and private chronicles has created a rich platform that gives our members a simple set of tools that offer tremendous flexibility.

I keep finding new ways to build my own health timeline on Medyear.com. For example, I have been able to use IFTT.com to create workflows that pipe my Fitbit data to my Medyear timeline. We are working on relaunching our corporate presence as Medyear.org. That will include a blog and I want to publish some ideas about how people are/can use their Medyear.com account to gather their health data in to one place.

Anyone can sign up for a free account on Medyear.com and start aggregating their health data. If you do, let me know I would love to learn about how you are using Medyear. Medyear is totally patient/member-centric so we want to hear from you about what you need to help you manage your health data.

It is early on Monday morning. After fueling with some Caffeine after a 3am start I am heading to Blueprint Health until early afternoon. If you are in the SoHo area why don’t we connect.

[tag health cloud BigData MongoDB NoSQL MongoDBWorld]

Mark Scrimshire
Health & Cloud Technology Consultant

Mark is available for challenging assignments at the intersection of Health and Technology using Big Data, Mobile and Cloud Technologies. If you need help to move, or create, your health applications in the cloud let’s talk.
Blog: http://blog.ekivemark.com
email: mark@ekivemark.com
Stay up-to-date: Twitter @ekivemark
Disclosure: I began as a Patient Engagement Advisor and am now CTO to Personiform, Inc. and their Medyear.com platform. Medyear is a powerful free tool that helps you collect, organize and securely share health information, however you want. Manage your own health records today. Medyear: The Power Grid for your Health.

Heading to #MongoDB World next week and planning to hang out @BPHealth with @Medyears outside of the conference

Next week I am attending MongoDB World on June 23-25th. I plan to blog from the event, so watch out for a stream of posts over the course of next week.

Recently I gave a presentation to various MongoDB executives on the opportunities for MongoDB in Healthcare. Since I am heading to MongoDB World I thought I should share my thoughts on the opportunities for NoSQL in the $2.75 Trillion Healthcare industry.

Here is my presentation, which I have uploaded to my ekivemark account on Slideshare:

If you are attending the conference in New York City tweet a message to @ekivemark and we can connect.

You may also find me sporting one of my Walking Gallery of Healthcare jackets.

Outside of the conference you will probably find me hanging out at Blueprint Health in Soho with the Medyear Team. Let’s connect!

Mark Scrimshire
Health & Cloud Technology Consultant

Mark is available for challenging assignments at the intersection of Health and Technology using Big Data, Mobile and Cloud Technologies. If you need help to move, or create, your health applications in the cloud let’s talk.
Blog: http://blog.ekivemark.com
email: mark@ekivemark.com
Stay up-to-date: Twitter @ekivemark
Disclosure: I am a Patient Engagement Advisor and CTO to Personiform, Inc and their Medyear.com platform. Medyear is a powerful free tool that helps you collect, organize and securely share health information, however you want. Manage your own health records today.

@swiftstack workshop and #wd demonstrating a server running immersed in fluid

HGST, a subsidiary of Western Digital are demonstrating a server running while immersed in 3M Engineered Fluid.

HGST is headquartered in San Jose, CA.

Fasinating Ultrastar drive that is hermetically sealed. The air is removed and replaced with helium. It allows closer packing of drives because there is less jitter as a result of air resistance.

HGST Active Archiving is designed for sub-second response to rarely accessed and never modified data. They are creating 6TB drives that will push to 10TB by the end of 2014. This is ideal for petabyte object storage environments.

A public test cluster running Swift is built on Atom servers running with HGST Ultrastar drives with a Red Hat or Ubuntu OS for OpenStack.

benefits for the drives come from more storage per server, at lower cost per TB given fewer servers and less power consumption than traditional drive storage solutions.

Mark Scrimshire
Health and Cloud Technologist
http://ekivemark.com

#OpenStack Design Workshop – Working with Swift. Presented by @SwiftStack and @Racktop Systems

Today I am at a design workshop for Swift – NOT the Apple programming language announced this week but rather the Object storage solution that is a core part of the OpenStack cloud platform.

The Training is being given by SwiftStack – http://www.swiftstack.com and Racktop Systems – http://www.racktop.com.

This is a One day workshop and I will be publishing real-time notes from the session. So here goes…

Swift and SwiftStack

Swift is the object storage platform for OpenStack.
SwiftStack is the leading contributor to Swift and provides a wrapper of services for Swift to make the service easier to implement.

SwiftStack is a Venture-backed company that was established in 2011.

SwiftStack provides the operational and management layer for Swift.

There are more than 2,000 contributors to the Swift platform.

Rackspace originated Swift but now IBM, AT&T, HP, Comcast and Time Warner Cable have all built on the Swift/SwiftStack platform.

Swift runs on commodity hardware, as does OpenStack.

 Storage in OpenStack includes:

  • Cinder (Block)
  • Swift (Object)
  • Manila (Shared File System)

Swift is the OpenStack equivalent of Amazon S3.

Design Goals:
Swift is an API.

Goals:
– Reliable
– Highly Scalable
– Hardware Proof – it assumes unreliable hardware.

Swift runs on any Linux-based architecture.

Load Balancers are outside swift. SwiftStack includes load balancing.

Load Balancer (includes SSL and Authentication) talks to:
– Proxy talks to:

– Account / Container / Object
– A replication and consistency layer  talks to:
– Standard servers with disks.
Authentication can use OpenStack Keystone but you can integrate other standards such as LDAP or Active Directory.
Guidance is to not use Keystone since it is not designed for lighter loads and doesn’t scale well to support heavily used and large scale environments.
Keystone can also create a single point of failure for Swift. Better options would be to use robust Active Directory or LDAP that will probably already exist in the environment.
Swift also offers a simple hashed user/password Auth function for quick setup.

 Proxy

– Enforces the default 3x replication
– Enforces a quorum
– Enforces User set ACLs
– Uses fastest available copy for reads (single read only required)
Swift doesn’t want RAID. Data protection is done by RAID.
Failure points:
– Node
– Zone
– Region
Swift will store data to create the most unique positioning of data to avoid placing in an location shared with another copy.
An Account Container keeps track of containers and objects
Objects stored by Object Servers. Metadata is stored with the data using a standard filesystem (XFS).

Disk Storage:

– No RAID
– Use SATA or SAS drives
– You can use SSD for Read heavy Caching

Nodes:

– Ubuntu
– CentoOS
– RHEL
Now for the Hands on workshop…. The Swiftstack platform I am working with is being provided out of Rackspace (http://www.rackspace.com/cloud/openstack/)
We are working with two things:
– Web Browser access to SwiftStack.com
– SSH to the node(s) we are creating.
NTP is an important service.
If your nodes use Active Directory for authentication, you should specify your AD servers’ hostnames or IPs for the NTP server settings.
Partition Power – Err on the side of over sizing. This is harder to change once set. Replication happens on a partition basis (regardless of data content in a partition). Replication has a performance overhead.
A Partition Power of 16 gives a pool of up to nearly 2,000  (1966) drives in a cluster. Which with 3TB drives yields usable storage of 1.97PB with 3TB drives.
A node can only belong to one cluster.
If nodes have multiple interfaces you can assign one interface to the proxy and load balancer and the other interface for intra-node communications.
Note: Linux uses Partition for a section of a drive
Swift uses Partition as a folder or bucket.

Building Big Data HealthCare Solutions in the cloud

I have just finished giving a presentation with Datastax on building HealthCare Big Data Solutions in the cloud.

Here are the slides from the session:

Travis Price also rounded out the session with a high level tutorial on Cassandra and a really cool demonstration of the robustness of the Datastax platform using a stack of Raspberry Pi computers configured running a version of Debian Linux and Datastax Enterprise configured as two 5-Node datacenters in a cluster.

The Raspberry Pis were configured with 512MB RAM, an SD Card as a drive and a USB-connected WiFi.It is a great way to show how Datastax works around and recovers hardware failures.

#Health2stat – all about data: experts from #nih talk about #health and #bigdata

WebMD reaches 174M unique visitors every month

Translational science – Rosemary Filiart

Advancing research to improve human health.
Integrating multiple disciplines.
Research and clinical data sets, applying appropriate privacy protections and using big data to reveal understanding.

Re-purposing data in a predictable and reproducible way and advancing towards personalized medicine and providing actionable and informed decisions at the point of care.

Repurposing yesterday’s data – Lisa Federer

Http://www./libraryinthecity.com
Radical re-use an example is shipping containers becoming hotel rooms http://www.sleepingaround.eu

Facilitating re-use

Translate expertise from analog to digital – look to librarians
1. Description: Standardized metadata eg. pubmed
2. Discoverability: data catalogs – these build on metadata
3. Dissemination: facilitating sharing while protecting privacy and intellectual property
4. Digital infrastructure: cyber infrastructure
5. Data Literacy: equipping people with tools and knowledge to be able to access data

How do we re-think the future of data. We don’t throw data away so how do we prepare it for future use.

Using clinical data at NIH – BTRIS data mining – Jim Cimino

The National Institutes of Health consists of 27 institutes and centers, many of which conduct clinical research. Research data are collected in the NIH Clinical Center’s electronic health record (EHR) and institute and laboratory systems. The Biomedical Translational Research Information System (BTRIS) is a repository that collects data from these sources to provide unified tools to support researchers in the analysis of their data. BTRIS is available to any NIH researcher who wishes to obtain data for secondary analyses to reexamine old questions or ask new ones. Non-NIH researchers can collaborate with NIH researchers in the analysis of BTRIS data.

50 data sources – mostly live daily feeds
Half a million patients
140,000 clinical concepts

They have a de-identified data set that goes back to 1976 that can be queried.

The self service query tool is incredibly powerful.

Discovering medical knowledge using BTRIS – Vojtech Huser

Vojtech works with BTRIS using R and a few other tools.
Meta map is an internal nih tool.

There are challenges in de-identifying data. Search and replace has to be very precise because there are numerous conditions that are named using people’s names. Eg. Removing Parkinson could also remove Parkinson’s disease references.

One interesting request: don’t take your EHR data to heaven – donate it to science.
This is something that consumer-mediated exchanges like MedYear.com could facilitate.

TB a world health problem – Stefan Jaeger

How do we detect TB in remote populations?

Some challenges: HIV populations have weakened immune systems making them susceptible to TB.
Some strains of TB are drug resistant.

USAID – AMPATH partnership working in Africa. Developing a portable X-ray machine that can be taken to villages to test people.

The next step is to use automatic image processing to identify TB in X-rays since there aren’t enough radiologists in Africa to review X-rays manually.

New computational tools and models for data mining – Jim DeLeo

John Von Neumann – “machines can think” 1955

Moved from what can the machine do to “what do we want the machine to do”

Jim’s area of expertise is computational intelligence.this subsumes artificial intelligence.

Machine Learning – just like humans by looking at lots of data and cluster and classify the data.

Jim’s latest focus is extreme multi-disciplinary teaming.
Every team member is passionate about the work. Short term. Clear deliverables.

Deep learning – new computational tools for biomedical learning – Jonathan Simon

Deep neural networks – a new technique for analyzing large volumes of data.
Machine learning by using existing data

Neural networks – simple inter-connected computational units. Modeled on the way the brain works.

Deep neural networks – based on neural networks and have very many hidden layers of computation. These emerged since 2006 as we gained new tools to analyze data and computational power became more affordable.

Biomedical deep learning is co-opting tools like image processing and adapting to medical applications. These tools are limited by the amount of data that is available. Fortunately more and more biomedical data is being made available online every year.

Now for the Q&A…

Mark Scrimshire
Health & Cloud Technology Consultant
Blog: http://blog.ekivemark.com

email: mark@ekivemark.comStay up-to-date: Twitter @ekivemark

Caring and Sharing in HealthCare – Medyear’s Personal #Health Network platform

I have just completed a webinar with Sqrrl to present the Medyear Personal Health Network platform. We built Medyear using the highly secure Sqrrl big data platform. This enables us to give our members granular control offer their health data. BlueButton and Direct Project standards allow members to upload their clinical data and add their own chronicles in to their health timeline.

Medyear has been building this platform for some time. The industry is now starting to recognize the value of putting the patient at the center. How do we know this? Because they have now got a label for the platform that Medyear has built: The Consumer-Mediated Exchange. Basically a Health  Record Platform that puts the Patient in control.

Check out the deck from the webinar on my Slideshare account:

Goto Vimeo for a quick video tour of the Medyear platform: https://vimeo.com/90151239 or goto Medyear.com to get your own account and try it out the beta platform for yourself.

Using Big Data to Wow the Customer

The Flexibility of Big Data platforms enable organizations to quickly consolidate information from multiple sources in order to WOW! the customer with excellent customer service. Think about it. Don’t you feel more positive when you are dealing with a Customer Service Agent that seems to have all the information at their fingertips. It is so much better than being bounced around between departments, or waiting for them to access multiple systems to get the information they need to help you.

Here is an example of how some organizations have used MongoDB to deliver the Wow!: