The 2016 Data Science Summit just wrapped up in San Francisco and it was bigger and better than ever. With over 1,300 attendees over two days, the conference combines business and academic leaders in a broad mix of machine learning areas – bringing together the latest in research with the state of the art in the industry. Many of the speakers are both leaders at key technology companies and involved with the top research institutions in the U.S.
Carlos Guestrin, with both Turi (previously Dato) and University of Washington, framed the world of intelligent applications including the opportunities for automating machine learning processes, creating online, closed-loop systems and increasing trust in machine learning applications.
Pedro Domingos, author of The Master Algorithm and also a UW professor, outlined the five schools of machine learning, their underlying philosophical approaches and the types of problems they best address.
Jeff Dean from Google highlighted their powerful new service TensorFlow along with its rapid adoption and independent forks in the open source community. Jeff emphasized that TensorFlow has potential beyond the deep learning area as an end-to-end system for Machine Learning applications.
While Jeff highlighted several Google ML use cases, Robin Glinton from Salesforce.com and Jure Leskovec from Pinterest (and Stanford University) impressed the audience with detailed examples of how to build and continually improve intelligent applications.
Stepping back, there are several observations from this conference that generally confirm and expanded upon learnings from Madrona’s recent AI/ML Summit in Seattle.
- Deep Learning is both real and overhyped. Deep learning is very well suited for image recognition problems and is growing in areas like speech recognition and translation. However, deep learning is only one branch of machine learning and is not the best approach for many intelligent application needs.
- Greater agility is required for intelligent applications in production. Agility comes in many forms, including automating development processes like data munging and feature engineering. It also applies to model training and ongoing model iterations for deployed intelligent apps. Automated, end-to-end pipelines that continually update production applications are rapidly becoming a requirement. These applications, like the ones consumers experience with Netflix and Spotify recommendations are increasingly referred to as “on line” applications due to their agility in both making real time recommendations and bringing data back to update models.
- “Closed” loops and “humans-in-the-loop” co-exist. Many intelligent applications become business solutions by involving humans to verify, enhance or act on machine outputs. These “humans-in-the-loop” cases are expected to persist for many years. However, intelligent applications increasingly require automated, closed-loop systems to meet narrow business requirements for performance and results. For example, product recommendations, fraud predictions and search results are expected to be more accurate and relevant than ever and delivered in milliseconds!
- The strategic value of differentiated data grows by the day. Intelligent applications are dependent on data, metadata and the models this data trains. Companies are increasingly strategic about the data they collect, the additional data they seek and the technologies they use to more rapidly train and deploy data models. Google’s internal use cases leveraging data like RankBrain are expanding. And, their decision to “open source” data models for image and speech recognition built on TensorFlow is a leading example of engaging the outside world to enhance a model’s training data.
Overall, I found the conference extremely energizing. There was substantial depth and a diversity of backgrounds, ideas and experiences amongst the participants. And, the conference furthered the momentum in moving from academic data science to deployed intelligent applications.