
About a year ago, Boston was issued dynamics A research version of the immediate quadruple robot,, Which comes with a low -level application programming interface (API) allows direct control of Spot joints. Even at the time, the rumors were that this application programming interface opened some important performance improvements immediately, including much faster operating speed. Those rumors came from Robots Institute and Institute of Artificial Intelligence (Rai)Previously Boston Institute of Artificial Intelligence DynamicsAnd if you are in Mark Rips He spoke at the ICRA@40 in Rotterdam last fall, you already know that it was not rumors at all.
Today, we are able to share some of the work of the Rai Institute to apply the reinforcement techniques on the basis of reality to enable much higher performance than Spot. The technologies themselves can also help very dynamic robots to work strongly, and there is a new device platform that shows this: an independent bike can jump.
See Spot Run
https://www.youtube.com/watch?
This video displays a spot that operates at a sustainable speed of 5.2 meters per second (11.6 miles per hour). Outside the box, the maximum speed of the spot 1.6 m/sAnd this means that the Rai spot has multiplied more than three times (!) The speed of the Quartet Factory.
If there is a spot that works at a very strange strange, then it may be because of that He is Strange, in the sense that the way the legs of the robot dog moves and its body because it does not look very similar to how it runs a real dog at all. “Walking is not biological, but the robot is not biological,” explains FARBOD FARSHIDIANRobotist at the Ray Institute. “Spot players differ from the muscles, and its movement is different, so the right gait for the dog for running quickly is not necessarily for this robot.”
The best Farshidian can classify how to move Spot is that it is somewhat similar to jogging, except for an additional trip (with all four feet on the ground at the same time) technically turning it into operation. Farrechidian says this flight is necessary, because the robot needs that time to withdraw its feet forward, respectively, quickly to maintain its speed. This “discovered behavior”, as the robot was not explicitly programmed to “operation”, was only required to find the best way to move as quickly as possible.
Learning reinforcement for the form of predictive control
The instant control unit that is charged with the robot when purchasing it from Boston Dynamics is based on the formation control (MPC), which includes creating a program model that approximates the robot dynamics as much as possible, and then Solve the problem of improving the tasks that the robot wants to do in real time. It is a very predictable and reliable way to control a robot, but it is also somewhat solid, because the original software model will not be close enough to reality to allow you to really pay the robot borders. And if you try to say, “Well, I will just create a vibrant software model from my robot and push the borders in this way.” It is stumbled because the problem of improvement It should be resolved for anything you want the robot to do, in actual time, and the more complicated the model, the more difficult it is to do so enough to be useful. Learn to reinforce (RL), on the other hand, learn In connection. You can use a complicated model what you want, then take all the time you need to simulate to train a control policy that can then be operated with great efficiency on the robot.
In simulation, some sites (or hundreds of spots) can be trained in parallel with strong realistic performance.Robots and Institute of Artificial Intelligence
In an example of the maximum speed in Spot, all the last details of all robot players are not simply designed within the models -based control system that is operated in the actual time on the robot. So instead, simplified assumptions (usually very conservative) are made about what the engines actually do so that you can expect safe and reliable performance.
Farshidian explains that these assumptions make it difficult to develop a beneficial understanding of performance restrictions. “Many people in robots know that one of the restrictions on running quickly is that you will reach the maximum torque and speed in your operating system. So, people are trying to design that the use of databases for operators. For us, the question that we wanted to answer was whether there is some last Phenomena that were actually limited performance. “
Searching for these other phenomena involves bringing new data to the reinforcement learning pipeline, such as the detailed models of the operator learned from the real performance of the robot. In the case of Spot, this answer to high -speed running. It turned out that what was limited by the speed of Spot was not the operators themselves, nor any of the robot’s movements: the batteries were simply unable to provide enough energy. “This was a surprise to me, because I thought we would reach the boundaries of the engine first,” says Farreidian.
The SPOT system is so complicated enough that there is likely to have an additional space for maneuver, and Farshidian says the only thing that prevented them from paying the top Spot velocity after 5.2 m/s is that they were unable to reach the battery voltage until they were unable to integrate the real world data into their RL model. “If we have batteries reserved there, we can run faster. If you design these phenomena as well as in our simulation device, I am sure that we can pay this further.”
Farshidian emphasizes that Rai technology is much more than just getting a place to run quickly – it can also be applied to make the immediate move more efficient to increase the battery life, or calmly to work better in an office or home environment. Basically, This is a circulating tool that can find new ways to expand the capabilities of any automatic system. And when the real world data is used to make a better emulator robot, you can ask for simulation to do more, with confidence that these simulation skills will be successfully transferred to the real robot.
Super mobility vehicle: teaching robot bikes for jumping
Reinforcement learning is not only good to increase robot performance – this performance can make more reliable. The Rai Institute experiences a completely new type of robot that it invented at home: a small jump bike called a superfurper transport vehicle, or UMV, which was trained to do Barkour using the same RL pipeline for budget and driving as it was used to operate high -speed Spot.
https://www.youtube.com/watch?
There is no independent physical installation system (such as gyroscope) that prevents UMV from falling; It is just a natural bike that can move forward and backward and head the front steering wheel. The largest possible mass is mobilized at the top, which operators can speed up quickly and down. “We are showing two things in this video,” he says. Marco HitterDirector of the Zurich Office at the Ray Institute. “One of them is how to learn reinforcement in making UMV very strong in driving abilities in various situations. Second, how it allows us to understand the capabilities of dynamic robots to do new things, such as jumping on a table higher than the robot itself.”
“The RL key in all this is to discover a new behavior and make this strong and reliable under the difficult conditions. This is where RL really lights up.” – Marco Huter, Rai Institute
It is impressive like jumping, for hot, difficult (if not more difficult) to do maneuvers that may seem somewhat simple, like a ride back. “Return back is very unstable,” explains. “At least for us, it was not really possible to do this using classicism [MPC] Control, especially on rugged terrain or with disorders. ”
The removal of this robot from the laboratory and to the terrain to do the appropriate Park Parkor is the work in progress that the Rai Institute says will be able to demonstrate in the near future, but it is not really related to what this private devices platform can do – it is related to what any Robot can do through RL and other learning -based methods, says Hutter. “The biggest image here is that the devices of these automated systems can theory more than we can achieve through our classic control algorithms. Understanding these hidden boundaries in devices systems allows us to improve performance and maintain borders to control.”
UMV education leads itself to the bottom of the stairs in SIM to a real robot that can handle stairs at any angle.Robots and Institute of Artificial Intelligence
Learning to reinforce robots everywhere
Just a few weeks ago, The Rai Institute announced a new partnership with Boston Dynamics “To enhance human robots through reinforcement learning.” Humanoids is just another type of automatic platform, although it is greatly more complicated with many degrees of freedom and the things you design and simulate. But when considering the ideal modular control restrictions of this level of complexity, the reinforcement learning approach appears almost inevitable, especially when such an approach is already simplified due to its ability to generalize.
“One of the aspirations we have as an institute is to have solutions that extend across all kinds of different platforms,” says Hutter. “It comes to building tools, about building infrastructure, and building the basis for that which must be done in a broader context. Therefore, not only humanity, but driving vehicles, forty, and calling them. But doing RL research and offering some of the first evidence of the concept is another thing – one thing to work in the real world under all circumstances, while pushing the limits in performance, it is another thing.”
Transfer skills to the real world has always been a challenge on trained robots in simulation, specifically because simulation is very friendly for robots. “If you spend enough time,” Farshidian explains, “” You can reach a reward job where the robot will eventually do whatever you want. What is often fails is when you want to transfer SIM behavior to the devices, because the reinforcement learning is very good in finding the defects in your simulation and benefiting from it to do the task. “
Simulation is much better, with new tools, more accurate dynamics, and a lot of computing power to throw the problem. “It is a very strong ability we can simulate many things, and create a lot of data almost free,” says Hitter. But the benefit of these data in its relationship to reality, making sure that what it simulates is accurate enough so that the reinforcement learning approach is actually resolved to reality. Hutter believes that restoring the material data collected on real devices to simulation is a very promising approach, whether it is applied to four -way operation, jumping bicycles or humanity. “The mixture of the two – simulation and reality – this is what I will assume is the right direction.”
From your site articles
Related articles about the web