Monopoly is a better game with real money and crime What if the classic board game had better rules for simulating capitalism? March 3, 2018 at 03:22PM via Digg http://bit.ly/2I2anym
FAIR has achieved noted advancements in the development of AI training hardware considered to be among the best in the world.
We have done this through a combination of hardware expertise, partner
relationships with vendors, and a significant strategic investment in AI
research.
FAIR is more than tripling its investment in GPU
hardware as we focus even more on research and enable other teams across
the company to use neural networks in our products and services.
As part of our ongoing commitment to open source and open standards, we
plan to contribute our innovations in GPU hardware to the Open Compute
Project so others can benefit from them.
Although
machine learning (ML) and artificial intelligence (AI) have been around
for decades, most of the recent advances in these fields have been
enabled by two trends: larger publicly available research data sets and
the availability of more powerful computers — specifically ones powered
by GPUs. Most of the major advances in these areas move forward in
lockstep with our computational ability, as faster hardware and software
allow us to explore deeper and more complex systems.
At
Facebook, we've made great progress thus far with off-the-shelf
infrastructure components and design. We've developed software that can read stories, answer questions about scenes, play games and even learn unspecified tasks
through observing some examples. But we realized that truly tackling
these problems at scale would require us to design our own systems.
Today, we're unveiling our next-generation GPU-based systems for
training neural networks, which we've code-named “Big Sur.”
Faster, more versatile, and efficient neural network training
Big
Sur is our newest Open Rack-compatible hardware designed for AI
computing at a large scale. In collaboration with partners, we've built
Big Sur to incorporate eight high-performance GPUs of up to 300 watts
each, with the flexibility to configure between multiple PCI-e
topologies. Leveraging NVIDIA's Tesla Accelerated Computing Platform,
Big Sur is twice as fast as our previous generation, which means we can
train twice as fast and explore networks twice as large. And
distributing training across eight GPUs allows us to scale the size and
speed of our networks by another factor of two.
123
In
addition to the improved performance, Big Sur is far more versatile and
efficient than the off-the-shelf solutions in our previous generation.
While many high-performance computing systems require special cooling
and other unique infrastructure to operate, we have optimized these new
servers for thermal and power efficiency, allowing us to operate them
even in our own free-air cooled, Open Compute standard data centers. Big
Sur was built with the NVIDIA Tesla M40 in mind but is qualified to
support a wide range of PCI-e cards. We also anticipate this will
achieve efficiencies in production and manufacturing, meaning we'll get a
lot more computational power per dollar we invest.
Servers can
also require maintenance and hefty operational resources, so, like the
other hardware in our data centers, Big Sur was designed around
operational efficiency and serviceability. We've removed the components
that don't get used very much, and components that fail relatively
frequently — such as hard drives and DIMMs — can now be removed and
replaced in a few seconds. Touch points for technicians are all Pantone
375 C green, the same touch-point color as all of Facebook’s custom data
center hardware, which allows technicians to intuitively identify,
access and remove parts. No special training or service guide is really
needed. Even the motherboard can be removed within a minute, whereas on
the original AI hardware platform it would take over an hour. In fact,
Big Sur is almost entirely toolless — the CPU heat sinks are the only
things you need a screwdriver for.
Collaboration through open source
We
plan to open-source Big Sur and will submit the design materials to the
Open Compute Project (OCP). Facebook has a culture of support for open
source software and hardware, and FAIR has continued that commitment by open-sourcing our code
and publishing our discoveries as academic papers freely available from
open-access sites. We're very excited to add hardware designed for AI
research and production to our list of contributions to the community.
We
want to make it a lot easier for AI researchers to share techniques and
technologies. As with all hardware systems that are released into the
open, it's our hope that others will be able to work with us to improve
it. We believe that this open collaboration helps foster innovation for
future designs, putting us all one step closer to building complex AI
systems that bring this kind of innovation to our users and, ultimately,
help us build a more open and connected world.
Thanks to all the people who helped make this happen, including William Arnold, Stephen Chan, Jia Ning, and Whitney Zhao.