Democratizzare l’accesso all’intelligenza artificiale

I vantaggi dell’IA possono diffondersi ampiamente attraverso un nuovo approccio di Neuromation, sfruttando l’infrastruttura hardware decentralizzata della blockchain per generare dati sintetici per applicazioni di deep learning. La recente esplosione di applicazioni concrete e diffuse di Intelligenza Artificiale è dovuta alla disponibilità di piattaforme hardware sempre più potenti e algoritmi sofisticati, insieme a una grande quantità di dati. L’efficacia dell’apprendimento automatico, un sottoinsieme dell’intelligenza artificiale, delle reti neurali artificiali e del deep learning, essi stessi sottoinsiemi di apprendimento automatico, ha sorpreso anche gli esperti.

La relazione trqa AI, ML, e deep learning. (Immagine da alltechbuzz.net)

L’intelligenza artificiale è esistita sin dalla nascita dei computer digitali negli anni ’50 e le reti neurali e di apprendimento automatico sono emerse negli anni ’80. Il miglioramento degli algoritmi e l’entità del vantaggio che rappresentavano divenne evidente quando si rese disponibile un hardware sufficientemente potente per sperimentarli all’inizio del 2010. In particolare, gli stessi approcci specialistici, le GPU (Graphical Processing Units), utilizzati per creare la grafica avanzata dei videogiochi, potevano essere utilizzati per l’elaborazione parallela richiesta dall’apprendimento automatico, con un’accelerazione drammatica dei risultati.

A sua volta, lo sviluppo del software si è sempre più affidato alla collaborazione decentralizzata, con open source e repository di componenti riutilizzabili che diventano così importanti, che quei team e aziende che non vogliono partecipare si sentono sempre meno in grado di competere. Oggi è naturale dare per scontato che l’approccio più avanzato sarà pubblicato su articoli scientifici, insieme agli algoritmi per implementarlo e codice di esempio per eseguirlo.

Questo lascia il ruolo dei dati come cruciale, specialmente nelle applicazioni commerciali, che non possono fare affidamento sui repository accademici. Le grandi aziende hanno un vantaggio decisivo, essendo in grado di dedicare risorse alla raccolta e alla cura dei dati dal mondo fisico, che mantengono come vantaggio competitivo. Spesso, come nel caso di Google e di Facebook, i dati non vengono solo raccolti direttamente dalle società, ma vengono forniti volontariamente dagli utenti delle loro applicazioni.

Neuromation democratizza l’accesso ad approcci avanzati di intelligenza artificiale per gli sviluppatori, consentendo l’uso della rete GPU ampiamente diffusa di mining blockchain per creare dati sintetici per la formazione di reti neurali. La sicurezza della blockchain dipende da calcoli che risolvono le sfide crittografiche, ma non producono essi stessi risultati utili, al di là dell’importante e concreto effetto di rendere sicure le transazioni blockchain. La novità dell’approccio di Neuromation è di raggiungere questo risultato generando calcoli utili. Come elemento importante, i dati sintetici generati sono per definizione correttamente etichettati, dal momento che il computer che li genera sa quale immagine sta disegnando a priori. È essenziale addestrare le reti da set di dati adeguatamente etichettati, e questo non è sempre realizzabile, anche quando è disponibile la raccolta di dati reali.

Nel video qui sotto puoi guardare una conversazione con Andrew Rabinovich, Advisor of Neuromation e Director of Deep Learning di Magic Leap, parlando del perché il deep learning è importante e di come questi nuovi approcci possano diffonderne la conoscenza e i benefici, democratizzando accesso all’intelligenza artificiale.

Ecco una trascrizione della conversazione:

– [David] Welcome everybody and welcome to Andrew Rabinovich, who is the Director of Deep Learning at Magic Leap, an advisor to Neuromation. Hello Andrew how are you?

– [Andrew] Good morning to you. – We will be talking about exciting topics that a lot of people are talking about but not a lot of people practice to the level that you do, artificial intelligence, deep learning, and more exotic topics trying to make them comprehensible. So, when did you start in AI?

– So the story goes back almost 20 years when I was still an undergraduate at the University of California in San Diego when I started building computerized microscopes, or as we called them back then sightometers where we were trying to detect cancer in tissue samples. Then we were using basic image processing techniques, but very quickly I realized that those weren’t sufficient and much more research, and back then machine learning, and computer vision was needed, and that’s when I started my graduate studies in Computer Vision and Machine Learning. So since 2002, I’ve been sort of studying theoretical and applied machine Learning and computer vision, and nowadays we call this classical vision and machine learning, and I’ve been doing that until 2012, when the deep learning revolution kind of occurred. At that time I was working at Google, working on all sorts of things related to photo-annotation and computer vision and there I sort of quickly realized that deep learning has the capability of solving problems that classical vision has never dreamed of. Then almost overnight I quickly switched to deep learning altogether and started spending all of my time on the theory of deep computation as well as its applications to computer vision.

– In artificial intelligence since the 80s or even before there were two kinds of approaches. A top down approach using expert systems, rule-based classification, all kinds of ways that we tried to teach computers, common sense, and how to make decisions based on our understanding how we reason. And on the other hand there were bottom up approaches spearheaded by artificial neural networks that tried to abstract the rules of reasoning without making them explicit, but as if the computers were able to discover these rules by themselves. And the neural network based approach appeared at the time to have very severe limits and was kind of the losing part of the AI balance and the AI approach. What made it burst into the forefront again? You said in 2012 suddenly something happened?

– That’s absolutely right, so neural networks and artificial intelligence as a field have been around since 60s from the days of Marvin Minsky and McCarthy. When people have been thinking about, from psychology, abstract math, and philosophy, about artificial intelligence but of course the computation was the limiting factor into any kind of experimentation and proofs. When talking about artificial intelligence we have to be very clear that it’s not a fundamental science whose mission is to describe and understand nature, but rather an engineering discipline that’s tasked with solving a practical problem. In order to solve problems sort of two critical components must come together. First is the computing power, or the engine that processes information, and the other is the data, or that gasoline for an internal combustion engine. The reason why this revolution took place in 2012 was because these two components came about together. The existence of fast compute, mainly the GPUs, and the abundance of images on the internet with a large presence of mobile devices able to capture information, whether its images, or speech, or text. Ironically the underlying math and the theory of artificial neural networks, or as we call them deep neural networks nowadays hasn’t changed very much. The model of a neuron was introduced by Frank Rosenblatt in the 60s which is until today very much a current one with a small modification that an activation function, or the non-linearity has been simplified even further from a sort of a logistic function like a sigmoid or hyperbolic tangent to something even simpler that’s a rectified linear unit. Otherwise the general structure of a neural network has remained the same. One interesting development that was brought forth by Yann LeCun in the late 80s is this notion of convolutions which was inspired by biological experiments of by Hubel, Wiesel indicating that there’s this hierarchical structure incorporating pooling of simple and complex cells and this is what the original multi-layer perception was missing, and with the introduction of these convolutional features that was first manifested by neural network called neurocognitron by Fukushima in 1988. These networks are really the models that we’re using today. Of course the models are now deeper and wider and run across multiple machines, but the essence is pretty much the same. The learning algorithm, mainly back propagation, using stochastic gradient descent was introduced by Geoff Hinton in 1986, so essentially for the last 30 years the guts for this technology hasn’t changed. So the pillars of success effectively is the data and the computer.

– And indeed the self-fulfilling prophecy of Moore’s Law is what enabled for the past 50 years computers to become more and more and more and more and more and more and more and more powerful. 50 years of exponential growth will really make a big difference. So as the availability of the large amounts of data and very powerful computational platforms became available neural networks and deep learning started to perform. There was a large data set, there is still large data set for objectively testing the performance of neural networks in vision tasks and if I am not mistaken the test as performed by humans would beat machine recognition, but image recognition by machines started to get better and better and today machines are as good or better than humans in recognizing tasks on that data set, is that right?

– That’s correct. So one of my colleagues from Stanford, and now is at Google, Professor Fei-Fei Li did a monumental effort in comprising together this data set that you’re talking about that’s called ImageNet, it’s a collection of about 10 million images comprised of about 1000 categories with the task of being able to identify the prominent category that’s exhibited in a given image. The state of the art today, of the best performing deep neural network yields an accuracy of about 97% for top five classification, while humans are only able to achieve 95% accuracy. Having said that, the deepest and largest deep neural networks that perform the best on these data sets have about the same number of neurons as a little, tiny rodent, like a mouse or a rat. What these approaches are good at, at recognizing patterns. So unlike humans they never forget and they never get distracted and there’s never an ambiguity between two types of bridges or two types of airplanes, something that humans aren’t very good at reasoning about. However, I want to point out right away that these tasks are very loosely related to any kind of general intelligence, decision-making, or reasoning. These tasks are primarily focused with pattern matching and detection of observed phenomenon. So these things are really very good at memorizing with some amount of generalization to unseen observations, while humans are actually made to infer from very limited learning. That’s why this particular data set requires there to be hundreds and thousands of training examples for these networks to get really good at doing what they do, but as soon as you scale the number of representations to tens of examples, then human performance will only suffer slightly while the degradation and accuracy for the machines will be very significant.

– When you talk about the deepest deep learning systems, you are referring to the layers of analysis and abstraction is that correct? And how many layers are we talking about when we talk about the deepest deep learning today?

– So the most widely used deepest networks is something that’s called a residual network, or ResNet, that comes from Microsoft research and that has about 151 layers where each of those layers is also often made up of sub-components. To my knowledge people in academic circles have pushed the boundaries of these networks to go up to 1000 layers, but the question isn’t just about the number of these layers but also about the width of the layers and their respective depth, in fact, there’s some theoretical results that suggest that only with a two layer neural network without specifying its width and depth of each specific layer it is possible to approximate any mathematical function, which suggests that with a two layer neural network, basically a network that’s comprised of two linear layers with two non-linear activations that follow were able to approximate any mathematical function, hence were able to solve any machine learning task. Of course the fewer layers you have the harder is to learn, that’s why people build these ginormous things because the training becomes much simpler.

– Now the data sets out of real world are hard to collect and the more differentiated areas you want neural networks to work on, the larger the task of collecting, separate well-designed data sets for those various tasks becomes and it is not a coincidence that large corporations like Facebook, and Microsoft, and Google are very busy in doing that with the cars that are taking photos of streets in the world, or whether it is analyzing the photos that are uploaded by the billions over social networks and so on. However, that really is a limiting factor for start-ups with teams of passionate and creative individuals to take advantage of deep learning approaches.

– That’s absolutely true. In fact, since the emergence of deep learning, the protection of intellectual property has shifted from algorithms, which now everyone freely shares, you’re not able to publish any scientific achievement without publishing the algorithm and the code along with it, shifted to the protection of data. So now rather than being the most powerful team because you have the best algorithms, now you’re the most powerful enterprise because you have the best data and that’s exactly why Google and Facebook are ahead of most other companies, not because they have more computers or because they have smarter researchers, but simply because they have most data. Data comes in two flavors. Raw data, something that one can acquire just by going around and recording, whether it’s speech, text, voice, or any other modality. But more importantly is the annotations of the data and that’s where it becomes very, very difficult to scale. Having accurate and detailed annotations of the data, or as we call them in the scientific circles, ground truth, is very, very hard to obtain. One option is to collect the ground truth by the virtue of recording, for instance if you want to take pictures of cars you have to enter the location and position and the make and model of each vehicle before you take a picture. And the second approach is once the data has been collected then it needs to go through this manual, and very laborious, expensive, and slow effort of manually labeling it and as you know there exist such services as Mechanical Turk, or CrowdFlower, and many others that actually allow you to submit your data to these services where human labelers go through these tedious tasks of labeling. Aside from being expensive and slow, the problem is that a, humans make mistakes, b, often times the questions that are being asked are ambiguous, where it’s not trivial like if show you a picture of a certain animal, and they say is it this kind of a cat or that kind of a cat, unless you’re a true expert of cats, you at best would guess, but that guessing would then translate to mistakes in the model that you would train from that data and finally humans are not able to do certain tasks at all. For instance if I show you an image of some gallery or a church and I say, which direction is the light coming from? Or how many light sources are there in the room? This is something that humans certainly can’t do because people are very good at relative estimations but nothing specific, or for an example if I give you an image of a street with a car on it and I say, how far away is the car from a traffic light? You would say, you know three, four meters, maybe five. I look at it I’ll say it’s two meters maybe 10. But it’s impossible to say exactly what the absolute metric distances are, but in fact that’s exactly what’s required for self-driving cars or autonomous navigation by any kind of robot. So those things are extremely difficult. The reason why these Google Cars and all these other self-driving companies have these crazy sensors on the cars as you see them driving, is for that reason exactly, but as they drive and take pictures of the world, they want to measure everything as precisely as possible because they know that humans aren’t able to label such things. So given these two approaches of labeled and unlabeled data this directly translates into two kinds of machine learning algorithms. Ones that are called supervised learning, where you learn by having examples with annotations. And unsupervised learning, where you’re just trying to get an understanding without any real supervision. And a great example of that is these generative adversarial networks, as you’ve probably seen they are able to render these crazy images of cats and dogs and people. But that’s still not enough so the approach that many people have taken including guys that open AI and a deep mind is to go the synthetic data route and to create synthetic environments where everything is perfectly labeled by construction. As you build this 3D world, with cars, people, trees, streets, and so forth, by virtually creating all of this in computer graphics software you automatically know the locations and positions of everything. You know which way the light balances. You know the direction of the surface normals. You know the reflectance properties of all the materials. So effectively you have everything you want. The problem with that, go ahead sorry.

– So you started to mention what is the breakthrough in the approach of Neuromation, that you are advising. The approach, that rather than relying on a data set collected from the physical world, creates data set, and these data sets are called synthetic data. And one of the reasons why synthetic data is so interesting is exactly because it opens the possibility for teams that are not at Google, not at Microsoft, not at Facebook, to take advantage of deep learning approaches and neural networks, and apply AI techniques to the problems that they are passionate about. So you were saying some additional features of synthetic data and what the consequences of this approach are?

– So as you correctly point out, since it’s very expensive to obtain real data with proper annotations, and since unsupervised learning techniques are not yet powerful enough I think going the route of creating synthetic data in training models on those data sets I think is a practical path forward. There’s still many challenges as to how to create this data, whether this data resembles the real world environment, but I think these tasks or these problems rather are no harder than the problems in unsupervised learning. So I think tackling AI from that perspective is an excellent endeavor.

– And to concretely describe in terms of hardware, the synthetic images of artificial three-dimensional worlds that can be photographed through virtual cameras are created through the graphics cards that are practically the same, or used to be the same, that are used are used for gaming, and that is what are made by NVIDIA, or other computer chip and card manufacturers, is because the same computational power required for rendering the ever more beautiful images and detailed images of artificial worlds in computer games is the same hardware that can be used for synthesizing the images to train these neural networks.

– That’s completely right. It used to be that most photo-realistic graphic renders were based on CPUs rather than GPUs, things like Maya and V-Ray renders, but recently, maybe two years ago Unreal Engine started working on renders using GPUs and it is becoming a fairly common practice in this deep learning community to use these GPUs and video cards from NVIDIA mainly, and use Unreal Engine to render these synthetic examples, something that can be done effectively in real time. Traditional renders on CPUs although were of higher visual fidelity took much, much longer, up 15, 20 minutes per image to get a good rendering, with speeds that slow one could argue that it’s not feasible to produce millions of examples to train deep neural networks, but with the Unreal approach running on a GPU practically in real time, then all of a sudden this solution becomes attractive because you can create a tremendous amount of data in a reasonable amount of time so that you can train these large deep networks. So again it’s this confluence of technology for production of synthetic data together with an abundance of really, fast and powerful graphic cards.

– And I don’t know whether it’s a coincidence or beautiful synchronicity that one of the reasons, or an additional reason, these cards more and more widely deployed, not only in the personal computers of gamers, but also in server racks of specialized but still widely distributed sell-ups, it is because the same card, the same type of hardware, the same type of GPU based computation is also used in mining Ethereum, or mining certain types of cryptocurrencies, and the cryptocurrency mining since now we are jumping from the field of AI into the field of blockchain is worth repeating very briefly is just a metaphor. Nobody’s mining any kind of rare metal. Actually I think it is a somewhat unfortunate metaphor. I personally prefer to use the metaphor of weaving, where there’s a pattern emerging from the collaborative effort of intricate cryptographic work, but in any case whether we use one or the other these operations are necessary to ensure the robustness of the trust network that blockchain operations implement and to make it impossible to falsify the transactions, and to falsify the operations over this network, to make it computationally unfeasible through the effort expending in the cryptographic operations across all those who participate. And so Neuromation actually put the two together. Neurommation says there is this challenge of democratizing access to the powerful approaches of neural networks and deep learning and artificial intelligence on one hand that synthetic data created through GPUs and on the other hand they observe here is this widely available hardware network of GPUs deployed for blockchain operations, and they put the two together. So describe a little bit how that works and why is that so powerful?

– So it’s quite remarkable that everything that’s involving what in my mind the next breakthrough in AI, mainly the training of deep neural networks, as well as creating data for creating data for training these neural networks can be done on identical hardware. So in the past you had to have some crazed supercomputer that would do your modeling, then you would have to have your graphics render forms that are made of entirely different hardware to create the data for simulations and then you have to put them together. Nowadays it’s very interesting that on a single GPU you can both create the data for training and train. So to sort of add to the coincidence, there’s been this explosion of cryptocurrency mining that also uses these GPUs. After this cryptocurrency craze has kind of settled down people are starting to realize that the biggest flaw in this whole design, in the mining process people have to perform significant amount of computation, however, at the end of the computing process they get rewarded with their Bitcoins or other cryptocurrencies, but in fact the result of computation is completely discarded and thrown away. It’s almost like you go and learn French and then at end of it you get a piece of chocolate but everything you learned is completely forgotten. So it seems kind of a waste. So the idea behind Neuromation is that instead of mining just for the sake of getting a key to get more Bitcoins or other cryptocurrencies, you mine by the virtue of solving an actual AI task, whether it’s training a deep neural network or creating synthetic data for training these neural networks, and at the end of the day you still get your cryptocurrency but the result of your computation is actually something useful that can be applied further down the chain for some practical application.

– This is a bit similar to how those squiggly puzzles that are often displayed when you set up a new account on an online platform, not only try to keep out scammers and spammers but when you look at the details, and for example this system is reCAPTCHA, it turns out that the process of verifying that you are human because you are able to recognize the words that are displayed, you are actually digitizing books, or solving other types of labeling tasks. So practically the same way as this uses humans who need to solve a task, but a task rather than been useless, it’s a useful task, similarly Neuromation uses the computational power of the blockchain planetary computer to solve tasks that are needed for the cryptographic network to be secure, but rather than applying that power to solve useless tasks it is applying that power to build knowledge, to build useful computation.

– That’s absolutely right. I think of this paradigm that’s similar, remember back in the 90s there was this project called SETI@home where people were trying to look for extraterrestrial activity and they didn’t have a single computer to do all the work, so they tried to spread this out across all the PCs that were available in the world, but then you would just get bragging rights, saying that you helped find E.T., and that was a reward in itself. Now with the presence of blockchain and cryptocurrency it’s become very natural to spread computation across all the owners and users of GPUs to be able to do these method processes and that’s why I think this notion of democratizing AI is a good description of Neuromation, because at the highest level it is really that. It provides people with an opportunity to a, gather training data by the virtue of synthesizing it and b, to train deep neural networks on the highly required GPU processors. These things are abundant in the world as you said for miners, gamers, and general public overall, but the standard mode of operation is you are Google, or Facebook, or Amazon, you go and build a ginormous warehouse with these GPUs and that’s the way, why you succeed, you often hear of conferences, engineers from Google present some scientific result and they say, we spent enough enough energy on these GPU data centers that would power 5,000 single-family homes for three months. On one hand they get some 3% improvement on this ImageNet data set, but in the reality it’s a, a tremendously unfair advantage to the smaller players, and b, it’s a ridiculously irresponsible way to waste energy that be conserved otherwise, but through this democratized approach of this decentralized model training and synthetic data generation I think these limitations hopefully will soon vanish and everybody else will be able to achieve the same results as the big guys.

– So Andrew, thank you very much for this conversation. Certainly your work in the field of artificial intelligence and deep learning is of great inspiration and we both are very excited and are looking forward for Neuromation to build and deliver their platform so that access to the advanced tools of AI can be democratized, because we all love and use Google, Facebook, and the tools that large corporations make available, but even they realize that talent is everywhere, creativity is everywhere, and we do need to empower those groups all around the world to express their ideas and that is what I am personally looking forward to see soon. Thank you very much for the conversation today.

– Thank you, I’m very excited about seeing Neuromation make these strides forward and are very passionate about making these tools available to large masses, because I think only through large participation from around the world will we be able to make next leaps in AI and intelligence overall.

(Disclaimer: I’m an advisor to Neuromation.)

Lascia un commento