As day one of the Deep Learning Summit draws to a close, we’re taking a look at some of the highlights. What did we learn, and what did you miss? With over 30 speakers discussing cutting edge technologies and research, there have been a series of varied and insightful presentations.
Sampriti Bhattacharyya kicked off this morning’s discussion by introducing the influential companies, researchers and innovative startups involved. Named in Forbes’ 30 under 30 last year, the founder of Hydroswarm had created the underwater drone that maps ocean floors and explores the deep sea by 28, and she spoke about the impact of deep learning on technologies and how it’s affecting our everyday lives.
Deep learning allows computers to learn from, experience and understand the world in terms of hierarchical concepts, connecting each concept back to a simpler previous step. With these technologies being implemented so rapidly, discussions covered the impact of deep learning as a disruptive trend in business and industry. How will you be impacted?
Facebook is at the forefront of these implementations and with ‘⅕ of the population using Facebook, the kind of complexities they experience are unimaginable’. We heard from research engineer Andrew Tulloch who explained how the millions of accounts are optimised to ‘receive the best user experience by running ML models, computing trillions of predictions every day.’ He explained that to ‘surface the right content at the right time presents an event prediction problem’. We heard about timeline prioritisation where Facebook can ‘go through the history of your photos and select what we think to be semantically pleasing posts’, as well as the explosion of video content over the past two years, which employs the same methods of classification apply to both photo and video. The discussion also covered natural language processing in translation, as Facebook are running billions of translations every day which need to be as accurate as possible, and we heard how they’re overcoming these complexities to deliver accurate translations. He also drew on the barriers previously faced in implementing machine learning cross device and its impact on mobile. ‘Over a billion people use Facebook on mobile only’, and on mobile, the ‘baseline computation unit is challenging to get good performance’, so building implementations and specifications for mobile is very important.
@dmurga: Cool to hear Andrew Tulloch of @facebook talk about #DeepLearning on mobile for better privacy, latency, and offline performance. #reworkDL
We next heard from Sangram Ganguly from NASA Earth Exchange Platform, who continued the discussion of image processing. The vision of the NASA-EEP is ‘to provide science as a service to the Earth science community addressing global environmental challenges’ and to ‘improve efficiency and expand the scope of NASA earth science tech, research and application programs’. Satellites capture images and are able to ‘create high resolution maps to predict climate changes and make projections for climate impact studies’. One problem that Ganguly faced in his research however, was the reliance on physics based models: as the datasets increase it’s important to blend these models with deep learning and machine learning to optimise the performance and speed of the machine and and create the most successful models. This fusion of physics and machine learning is the driving force of high resolution airborne image analysis and classification.
Next up was Dilip Krishnan from Google, who went on to explore new approaches to unsupervised domain adaptation. Their goal is to ‘train a machine learning model on a source dataset and apply this on a target data set, where it’s assumed that there are no labels at all in unsupervised domain adaptation’. He discussed the difficulties in implementation and shared two approaches to the problem. The first approach ‘mapping source domain features to target domain features’ focuses on learning a shared representation between the two domains where we explicitly learn to extract image representations that are partitioned into two subspaces: one component which is private to each domain and one which is shared across domains. The second approach which has been more popular and effective is ‘end-to-end learning of domain invariant features with a similarity loss.’ Krishnan proposes a new model that learns, in an unsupervised manner, a transformation in the pixel space from one domain to the other. The generative adversarial network (GAN)-based method adapts synthetic images to make them appear more realistic. This method is one ‘that improves upon state of the art feature level unsupervised domain recognition and adaptation’.
@dmurga: Nice @Google #DeepLearning arch reflecting task intuition: separate private & shared info for interpretable #DomainAdaptation. #reworkDL
After a coffee break and plenty of networking, Leonid Sigal from Disney Research expanded on the difficulties of relying on large-scale annotated datasets and explained how they are currently implementing a class of semantic manifold embedding approaches that are designed to perform well when the necessary data is unavailable. For example to accurately classify an image, you need more than 1000 similar images to draw against to teach the machine to recognise this classification, but ‘very few specific images have this amount of images to use, for example ‘zebras climbing trees’ only has one or two images to sample against’. Disney need to be able to localise images, have linguistic descriptions of images, and this is where it becomes much more complicated. Sigal explained how they are currently working with embedding methods using algorithms with weak or no supervision so that algorithms can work out how to classify each image, and make it much more efficient. Sigal’s work in deep learning has helped him with his problem solving in not only image classification, but character animation and retargeting, and he is currently researching into action recognition and object detection and categorisation.
@joeddav: Interesting approach to incorporating semantics into object detection and localization from @DisneyResearch #reworkdl #deeplearning
With such an abundance of applications for visual recognition, vision is not the only sense available to us, and Soundnet’s Carl Vondrik discussed his work in training machines to understand sound through image tagging. Whilst sources for mapping images tend to be readily available, ‘it’s difficult to get lots of data for specific sounds as there isn’t the same availability as there is in image recognition.’ To overcome this, Carl explained how SoundNet can ‘take advantage of the natural synchronisation between vision and sound in videos to train machines to recognise sound.’ He explained how they can take the models in vision already trained to recognise images and ‘synchronise it with sounds and use it as a teacher.’ After testing the audience and asking us to identify specific sounds and running our results against SoundNet, it materialised that SoundNet’s analysis was far superior than humans. Where a sound such as bubbling water, breathing, and splashing is identified immediately as scuba diving, buy SoundNet, the audience were unable to comprise the types of sound and draw this conclusion as easily as the system.
@datatrell: Some may think sound learning is a solved problem, but there’s much that can be done. Excited for @cvondrick‘s talk! #reworkdl
For a autonomous systems to be successful, they not only need to understand these sounds and the visual world, but also communicate that understanding wit humans. To expand on this and wrap up this morning’s presentations we heard from Sanja Fidler from University of Toronto who spoke about the progressions towards automatically understanding stories and creating complex image descriptions from videos. Fidler is currently exploiting the alignment between movies and books in order to build more descriptive captioning systems, and she spoke about how it is possible for a machine to automatically caption an image by mapping it’s story to a book, and then to teach the machine to assign combinations of these descriptions to new images. The end goal of this work is to create an automatic understanding of stories from long and complex videos. This data can then be used to help robots gain a more in depth understanding of humans, and to ‘build conversation models in robots’.
After a busy morning attendees chatted over lunch, visited our exhibitors, and had the opportunity to write ‘deep learning’ in their own language on our RE•WORK world map.
The summit continued this afternoon with presentations from Helen Greiner, ChPhy Works, Drones Need to Learn; Stefanie Tellex, Brown University, Lex Fridman, MIT, Deep Learning for Self-Driving Cars; Maithra Raghu, Cornell University/Google Brain, Deep Understanding: Steps towards Interpreting the Internals of Neural Networks; Anatoly Gorchechnikov, Neurala, AI and the Bio Brain; Ben Klein, eBay, Finding Similar Listings at eBay Using Visual Similarity, and many more.
We’ll be back again with Deep Learning Boston tomorrow, and will cover the applications of deep learning in industry and will be hearing from the likes of Sam Zimmerman, Freebird, Deep Learning and Real-Time Flight Prediction; Anatoly Gorchechnikov, Neurala, AI and the Bio Brain; David Murgatroyd, Spotify, Agile Deep Learning; Ben Klein, eBay, Finding Similar Listings at eBay Using Visual Similarity.
View the schedule for the remainder of the summit here. |
Couldn’t make it to Boston? If you want to hear more from the Deep Learning Summit and Deep Learning In Healthcare Summit you can register here for on-demand video access.
To continue our Global Deep Learning Summit Series find out about our upcoming events in London, Montreal, and San Francisco.