HyperAI

Dataset Guide

Here is a summary of how to use the HyperNeural Dataset and common problems.

Download method

Super Neuro provides three ways to download datasets: seed download (most common), HTTP(S) direct link download, and Baidu Netdisk download.

* Dataset torrent download

Go to the dataset page to download the dataset (as shown below).What you get is the dataset seed file, which needs to be parsed after downloading.

For seed parsing, the following four methods are recommended:

  1. Transmission:The most commonly used free BT client, with good compatibility for large resources (> 1 TB) and excellent performance, recommended for use. Supports common distributions of macOS, Windows, and Unix. Transmission download mirror:Transmission installation package download.
  2. qBittorrent:The interface is developed based on Qt and supports common distributions of macOS, Windows, and Unix. It establishes connections quickly and has the best support for the WebSeed protocol, but its support for large files is average. It is recommended for small files and seeds with WebSeed.
  3. aria2:A lightweight command line download tool written in C++ that supports the BitTorrent protocol and can be used with many GUIs. It has good compatibility with Super Neural's WebSeed and is recommended for use. For detailed usage, please refer to aria2 Wizard.
  4. Thunder:The interface is intuitive, compatible with multiple platforms, and it can parse torrents quickly. It is especially suitable for small files and torrents containing WebSeed, and has high download efficiency.However, Thunder download speeds are unstable and may encounter performance issues when processing very large torrents.There may be cases where the file is corrupted after the download is complete.

* HTTP (S) direct download

For some smaller datasets (< 100 MB), it is recommended to use HTTP (S) direct link download. Please visit the specific dataset page to see if direct link support is provided.

* Baidu Cloud Disk Download

Some data sets will be saved in Baidu Netdisk for diversion.

FAQ

1. Why is the torrent downloading slow?

Due to the particularity of the data set, BT downloads cannot immediately obtain a stable download speed like other resources, so please wait patiently for the tracker's broadcast.It is recommended to use a public IP for faster download speed.For example, opening a Windows instance on Alibaba Cloud costs about 0.3 yuan/GB (excluding server rental fees).

If you think there is a connectivity issue with the Super Neural Network seeding server, you can try visiting Super Neural NetworkServer Statuspage.

If you encounter a problem downloading the dataset, please contact WeChat: Hyperai01, or scan the QR code below to add ↓

2. What should I do if the torrent download speed is slow (< 10 Mbps)?

If you experience slow speeds, try:

  • Switch to a public IP (such as Alibaba Cloud mentioned above) to get more peers.
  • Turn off the system proxy or use a self-built proxy that does not block BT traffic.
  • Theoretically, if you are located in mainland China, you can get faster speeds by using a proxy server that supports BT traffic. It is recommended to use a US line optimized for the mainland (such as CN2 GT, CN2 GIA), which can achieve a single-user download speed of 200+ Mbps.

3. How to decompress the compressed package of volumes?

If the dataset you downloaded is large, for example AVSpeechIn order to prevent the size of a single file from exceeding the system limit and the total number of files in a single directory from exceeding the limit of the file system, Super Neural Network will package and compress such data sets.

Note: Before merging files, make sure that the target storage device has enough free space to store the merged files. For example, if the volume files to be merged occupy 200 GB of space, the current storage device must have at least 200 GB of free space to perform the merge operation.

3.1 If it is in the form of .z01 – .z0n + .zip

This format is ZIP volume compression, and the volume compression parameters used by SuperNeural are:

$ zip -0 -r data.zip dataset/ -s 10000m

The corresponding merge command is:

$ zip -s 0 data.zip --out data-combined.zip

After completing the merge operation, you can perform the decompression operation:

$ unzip data-combined.zip

3.2 If it is in the form of tar.0 – tar.n

This format is TAR volume packaging, and the volume compression parameters used by Super Neural are:

$ split -b 10000m data-combined.tar data.tar.

The corresponding merge command is:

$ cat data.tar.0 data.tar.1 data.tar.2 data.tar.3 > data-combined.tar

After performing the merge operation, you can perform the extract operation:

$ tar xvf data-combined.tar

If you are using Windows operating system, 7-Zip can automatically identify the volume compression package.

The downloaded dataset is incomplete/erroneous/discontinued?

Please add WeChat: Hyperai01 to contact us.


Dataset seeding/preservation

  • 2 seedboxes located overseas (two in the US + one in the Netherlands, each with 1 Gbps uplink)
  • 1 server located in China (China Unicom, 70 Mbps uplink)
  • 1 server located in China (China Telecom, 500 Mbps uplink)
  • 1 server located in China (China Telecom, 100 Mbps uplink, only provides seeding acceleration for some datasets)

For some smaller datasets (< 500 MB), we provide WebSeed to assist in seeding.


Community Open Source Agreement

All datasets in this community are shared by users and are only for academic, scientific research and teaching purposes. They do not support commercial purposes. If there is any infringement of the rights of individuals or groups, please contact WeChat "Hyperai01" to delete it.