2022 Annual Review | More Mature AI, More Disruptive Technology Carnival

2022 is coming to an end amidst the coughing sounds. This year, AIGC has become the biggest dark horse in the field of artificial intelligence, ScienceAI has more practical applications, this year we have experienced budget cuts and layoffs by major manufacturers, and also experienced a technological carnival of survival... This article,Let’s review with our colleagues the groundbreaking research and development achievements in the field of artificial intelligence in 2022.

data2vec

A general framework for self-supervised learning of speech, vision, and text

Publishing Agency:Meta AI

Release time:January 2022

Project address:

https://github.com/facebookresearch/fairseq/tree/main/examples/data2vec

data2vec is aA unified multimodal self-supervised learning model,It can process images, text, voice and other tasks with higher performance.

On December 16, data2vec 2.0 was released, which is 16 times faster than existing computer vision self-supervised algorithms with the same accuracy.

AlphaCode

Competition-level code generation AI

Publishing Agency:DeepMind

Release time:February 2022

Project address:

https://github.com/deepmind/code_contests

AlphaCode uses a large language model to construct code based on the language description of the problem.In the Codeforces challenge, AlphaCode defeated 46%. The research not only appeared on the cover of Science, but was also named one of the top ten scientific breakthroughs of the year by the magazine.

Dall E 2

Text to image generation tool

Publishing Agency:OpenAI

Release time:April 2022

Project address: https://openai.com/dall-e-2/

Dall·E 2 Based on the text description, you can create artistic images with more realistic colors.Compared to Dall·E released by OpenAI in 2021, Dall·E 2 can generate more realistic and accurate images with 4 times the resolution.

An astronaut riding a horse in a photorealistic style An example of an image generated by Dall·E 2

Gato

All-round intelligent agent

Publishing Agency:DeepMind

Release time:May 2022

Project address:

https://www.deepmind.com/blog/a-generalist-agent

Gato is an all-round intelligent agent.It can play Atari games, describe images, chat, and output text, joint torque, or other tokens based on the context.

This general model solves all tasks similar to artificial intelligence and may eventually surpass domain-specific models.

Add a caption for the image, no more than 140 characters (optional)

ESM Fold

Protein structure prediction models

Publishing Agency:Meta AI

Release time:July 2022

Project address:

https://github.com/facebookresearch/esm

ESM Fold is a model for predicting protein sequences.It can directly perform highly accurate, end-to-end, atomic-level structure prediction. It uses only a single input sequence and only needs to look at a single protein sequence, which greatly speeds up inference.

Single sequence structure prediction using ESM Fold

Make-A-Video

AI system that generates videos from text

Publishing Agency:MetaAI

Release time:September 2022

Project address: https://makeavideo.studio/

Make-A-Video is a text-to-video generation model that learns common descriptions from images with text descriptions, and understands and learns movement patterns using unlabeled videos.

The videos generated by Make-A-Video have diverse styles and high text restoration, making it a SOTA model for generating short videos.

Some examples of generating videos based on text descriptions

AlphaTensor

Improve matrix multiplication and increase calculation speed

Publishing Agency:DeepMind

Release time:October 2022

Project address:

https://github.com/deepmind/alphatensor

AlphaTensor improves the current optimal 4*4 matrix multiplication and further improves the calculation speed of more than 70 other matrix multiplications of different sizes. This achievement was featured on the cover of Nature.Named one of the top ten scientific breakthroughs of the year by Scinece magazine.

Magic 3D

Text-to-3D content creation tools

Publishing Agency:NVIDIA

Release time:November 2022

Project address:

https://deepimagination.cc/Magic3D/

NVIDIA joins AIGC3D Mesh models can be generated based on text descriptions.It combines image conditioning technology and text-based prompt editing methods to provide a new way to control 3D synthesis, making it possible to create high-quality 3D Mesh models.

Magic 3D creates text-to-3D content in two stages.

ChatGPT

Super Conversation Model

Publishing Agency:OpenAI

Release time:November 2022

Project address:

https://openai.com/blog/chatgpt/

ChatGPT is trained using RLHF (Reinforcement Learning from Human Feedback), the same method used by InstructGPT, with only slight differences in the data collection setting.

ChatGPT can chat and communicate like a human, and complete tasks such as writing emails, video scripts, copywriting, translation, and coding. Since its launch, it has attracted countless developers at home and abroad to try it out and have heated discussions. It can be said to be the technical project with the highest developer participation in 2022.

Point E

Generate 3D point cloud based on text description

Publishing Agency:OpenAI

Release time:December 2022

Project address:

https://github.com/openai/point-e

The process of generating a 3D point cloud based on text prompts using Point·E is divided into three steps::

1. Generate a synthetic view based on the text prompt

2. Generate a coarse point cloud (1024 points) based on the synthetic view

3. Generate fine point cloud (4096 Point) based on low-resolution point cloud and synthetic view

Using a single Point·E card, 3D point cloud can be generated in 1 minute, and text-to-3D says goodbye to the era of high computing power consumption.

Winter is gone and spring is here, let’s imagine 2023

2022 is coming to an end, and 2023 is destined to be a year full of unknowns. What new achievements will there be in the field of AIGC? How will ScienceAI respond to the challenges brought about by the intersection of basic science and AI? What new breakthroughs will be made in chip research and development and domestic operating systems?

What are your predictions for the technologies and applications in the field of artificial intelligence in 2023? Welcome to leave a message to discuss~

Chao Neuro also has many articles introducing the development of artificial intelligence in the past year. Click to read~

HyperAI

2022 Annual Review | More Mature AI, More Disruptive Technology Carnival

4 years ago

Information

data2vec

A general framework for self-supervised learning of speech, vision, and text

Publishing Agency:Meta AI

Release time:January 2022

Project address:

https://github.com/facebookresearch/fairseq/tree/main/examples/data2vec

data2vec is aA unified multimodal self-supervised learning model,It can process images, text, voice and other tasks with higher performance.

On December 16, data2vec 2.0 was released, which is 16 times faster than existing computer vision self-supervised algorithms with the same accuracy.

AlphaCode

Competition-level code generation AI

Publishing Agency:DeepMind

Release time:February 2022

Project address:

https://github.com/deepmind/code_contests

Dall E 2

Text to image generation tool

Publishing Agency:OpenAI

Release time:April 2022

Project address: https://openai.com/dall-e-2/

Gato

All-round intelligent agent

Publishing Agency:DeepMind

Release time:May 2022

Project address:

https://www.deepmind.com/blog/a-generalist-agent

Gato is an all-round intelligent agent.It can play Atari games, describe images, chat, and output text, joint torque, or other tokens based on the context.

This general model solves all tasks similar to artificial intelligence and may eventually surpass domain-specific models.

ESM Fold

Protein structure prediction models

Publishing Agency:Meta AI

Release time:July 2022

Project address:

https://github.com/facebookresearch/esm

Make-A-Video

AI system that generates videos from text

Publishing Agency:MetaAI

Release time:September 2022

Project address: https://makeavideo.studio/

Make-A-Video is a text-to-video generation model that learns common descriptions from images with text descriptions, and understands and learns movement patterns using unlabeled videos.

The videos generated by Make-A-Video have diverse styles and high text restoration, making it a SOTA model for generating short videos.

AlphaTensor

Improve matrix multiplication and increase calculation speed

Publishing Agency:DeepMind

Release time:October 2022

Project address:

https://github.com/deepmind/alphatensor

Magic 3D

Text-to-3D content creation tools

Publishing Agency:NVIDIA

Release time:November 2022

Project address:

https://deepimagination.cc/Magic3D/

ChatGPT

Super Conversation Model

Publishing Agency:OpenAI

Release time:November 2022

Project address:

https://openai.com/blog/chatgpt/

ChatGPT is trained using RLHF (Reinforcement Learning from Human Feedback), the same method used by InstructGPT, with only slight differences in the data collection setting.

Point E

Generate 3D point cloud based on text description

Publishing Agency:OpenAI

Release time:December 2022

Project address:

https://github.com/openai/point-e

The process of generating a 3D point cloud based on text prompts using Point·E is divided into three steps::

1. Generate a synthetic view based on the text prompt

2. Generate a coarse point cloud (1024 points) based on the synthetic view

3. Generate fine point cloud (4096 Point) based on low-resolution point cloud and synthetic view

Using a single Point·E card, 3D point cloud can be generated in 1 minute, and text-to-3D says goodbye to the era of high computing power consumption.

Winter is gone and spring is here, let’s imagine 2023

What are your predictions for the technologies and applications in the field of artificial intelligence in 2023? Welcome to leave a message to discuss~

Chao Neuro also has many articles introducing the development of artificial intelligence in the past year. Click to read~

data2vec

A general framework for self-supervised learning of speech, vision, and text

Publishing Agency:Meta AI

Release time:January 2022

Project address:

https://github.com/facebookresearch/fairseq/tree/main/examples/data2vec

data2vec is aA unified multimodal self-supervised learning model,It can process images, text, voice and other tasks with higher performance.

On December 16, data2vec 2.0 was released, which is 16 times faster than existing computer vision self-supervised algorithms with the same accuracy.

AlphaCode

Competition-level code generation AI

Publishing Agency:DeepMind

Release time:February 2022

Project address:

https://github.com/deepmind/code_contests

Dall E 2

Text to image generation tool

Publishing Agency:OpenAI

Release time:April 2022

Project address: https://openai.com/dall-e-2/

Gato

All-round intelligent agent

Publishing Agency:DeepMind

Release time:May 2022

Project address:

https://www.deepmind.com/blog/a-generalist-agent

Gato is an all-round intelligent agent.It can play Atari games, describe images, chat, and output text, joint torque, or other tokens based on the context.

This general model solves all tasks similar to artificial intelligence and may eventually surpass domain-specific models.

ESM Fold

Protein structure prediction models

Publishing Agency:Meta AI

Release time:July 2022

Project address:

https://github.com/facebookresearch/esm

Make-A-Video

AI system that generates videos from text

Publishing Agency:MetaAI

Release time:September 2022

Project address: https://makeavideo.studio/

Make-A-Video is a text-to-video generation model that learns common descriptions from images with text descriptions, and understands and learns movement patterns using unlabeled videos.

The videos generated by Make-A-Video have diverse styles and high text restoration, making it a SOTA model for generating short videos.

AlphaTensor

Improve matrix multiplication and increase calculation speed

Publishing Agency:DeepMind

Release time:October 2022

Project address:

https://github.com/deepmind/alphatensor

Magic 3D

Text-to-3D content creation tools

Publishing Agency:NVIDIA

Release time:November 2022

Project address:

https://deepimagination.cc/Magic3D/

ChatGPT

Super Conversation Model

Publishing Agency:OpenAI

Release time:November 2022

Project address:

https://openai.com/blog/chatgpt/

ChatGPT is trained using RLHF (Reinforcement Learning from Human Feedback), the same method used by InstructGPT, with only slight differences in the data collection setting.

Point E

Generate 3D point cloud based on text description

Publishing Agency:OpenAI

Release time:December 2022

Project address:

https://github.com/openai/point-e

The process of generating a 3D point cloud based on text prompts using Point·E is divided into three steps::

1. Generate a synthetic view based on the text prompt

2. Generate a coarse point cloud (1024 points) based on the synthetic view

3. Generate fine point cloud (4096 Point) based on low-resolution point cloud and synthetic view

Using a single Point·E card, 3D point cloud can be generated in 1 minute, and text-to-3D says goodbye to the era of high computing power consumption.

Winter is gone and spring is here, let’s imagine 2023

What are your predictions for the technologies and applications in the field of artificial intelligence in 2023? Welcome to leave a message to discuss~

Chao Neuro also has many articles introducing the development of artificial intelligence in the past year. Click to read~

Command Palette

2022 Annual Review | More Mature AI, More Disruptive Technology Carnival

data2vec

AlphaCode

Dall E 2

Gato

ESM Fold

Make-A-Video

AlphaTensor

Magic 3D

ChatGPT

Point E

Winter is gone and spring is here, let’s imagine 2023

Command Palette

2022 Annual Review | More Mature AI, More Disruptive Technology Carnival

data2vec

AlphaCode

Dall E 2

Gato

ESM Fold

Make-A-Video

AlphaTensor

Magic 3D

ChatGPT

Point E

Winter is gone and spring is here, let’s imagine 2023

Related News

A French Team Successfully Predicted 2.39 Million Antiphage Proteins and Used a Deep Learning Model to Map Bacterial Antiviral immunity.

Event Preview | AI Computing, TileRT, Tencent, Huawei, and AI Computing Innovation Join Forces to Explore Multi-Level Collaborative Optimization

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Online Tutorial | 16GB Laptop Achieves Nearly 26B MoE Performance: Gemma 4 12B Based on Innovative Architecture for Unified Processing of Text/Image/Sound Modalities

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

AI-driven De Novo Design of Diverse small-molecule Binding Proteins: A South Korean Team Discovered a Protein That Can Selectively Recognize Stress hormones.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Free CPU Online Tutorial | Hermes Agent: Learn Long-Term Memory? The Memory Enhancement Plugin TencentDB Agent Memory Can Store Facts, Preferences, Task States, etc., separately.

Command Palette

2022 Annual Review | More Mature AI, More Disruptive Technology Carnival

data2vec

AlphaCode

Dall E 2

Gato

ESM Fold

Make-A-Video

AlphaTensor

Magic 3D

ChatGPT

Point E

Winter is gone and spring is here, let’s imagine 2023

Related News

A French Team Successfully Predicted 2.39 Million Antiphage Proteins and Used a Deep Learning Model to Map Bacterial Antiviral immunity.

Event Preview | AI Computing, TileRT, Tencent, Huawei, and AI Computing Innovation Join Forces to Explore Multi-Level Collaborative Optimization

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Online Tutorial | 16GB Laptop Achieves Nearly 26B MoE Performance: Gemma 4 12B Based on Innovative Architecture for Unified Processing of Text/Image/Sound Modalities

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

AI-driven De Novo Design of Diverse small-molecule Binding Proteins: A South Korean Team Discovered a Protein That Can Selectively Recognize Stress hormones.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Free CPU Online Tutorial | Hermes Agent: Learn Long-Term Memory? The Memory Enhancement Plugin TencentDB Agent Memory Can Store Facts, Preferences, Task States, etc., separately.

Related News

A French Team Successfully Predicted 2.39 Million Antiphage Proteins and Used a Deep Learning Model to Map Bacterial Antiviral immunity.

Event Preview | AI Computing, TileRT, Tencent, Huawei, and AI Computing Innovation Join Forces to Explore Multi-Level Collaborative Optimization

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Online Tutorial | 16GB Laptop Achieves Nearly 26B MoE Performance: Gemma 4 12B Based on Innovative Architecture for Unified Processing of Text/Image/Sound Modalities

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

AI-driven De Novo Design of Diverse small-molecule Binding Proteins: A South Korean Team Discovered a Protein That Can Selectively Recognize Stress hormones.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Free CPU Online Tutorial | Hermes Agent: Learn Long-Term Memory? The Memory Enhancement Plugin TencentDB Agent Memory Can Store Facts, Preferences, Task States, etc., separately.

Related News

A French Team Successfully Predicted 2.39 Million Antiphage Proteins and Used a Deep Learning Model to Map Bacterial Antiviral immunity.

Event Preview | AI Computing, TileRT, Tencent, Huawei, and AI Computing Innovation Join Forces to Explore Multi-Level Collaborative Optimization

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Online Tutorial | 16GB Laptop Achieves Nearly 26B MoE Performance: Gemma 4 12B Based on Innovative Architecture for Unified Processing of Text/Image/Sound Modalities

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

AI-driven De Novo Design of Diverse small-molecule Binding Proteins: A South Korean Team Discovered a Protein That Can Selectively Recognize Stress hormones.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Free CPU Online Tutorial | Hermes Agent: Learn Long-Term Memory? The Memory Enhancement Plugin TencentDB Agent Memory Can Store Facts, Preferences, Task States, etc., separately.