Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Datasets:
mimir-lcm
/
fineweb-2-sentence-split
like
0
Follow
Mimir
3
Modalities:
Text
Formats:
parquet
Languages:
Arabic
Belarusian
Bulgarian
+ 41
Size:
100M - 1B
ArXiv:
arxiv:
2506.20920
arxiv:
2605.25263
Libraries:
Datasets
Dask
Polars
+ 1
License:
odc-by
Dataset card
Data Studio
Files
Files and versions
xet
Community
1
main
fineweb-2-sentence-split
/
arb_Arab
37.6 GB
1 contributor
History:
1 commit
m-elio
Upload folder using huggingface_hub
4c991ba
verified
13 days ago
edu-multil-split_arb_Arab_000_00000.parquet
4.94 GB
xet
Upload folder using huggingface_hub
13 days ago
edu-multil-split_arb_Arab_000_00001.parquet
4.9 GB
xet
Upload folder using huggingface_hub
13 days ago
edu-multil-split_arb_Arab_000_00002.parquet
4.87 GB
xet
Upload folder using huggingface_hub
13 days ago
edu-multil-split_arb_Arab_000_00003.parquet
4.85 GB
xet
Upload folder using huggingface_hub
13 days ago
edu-multil-split_arb_Arab_000_00004.parquet
4.82 GB
xet
Upload folder using huggingface_hub
13 days ago
edu-multil-split_arb_Arab_000_00005.parquet
4.4 GB
xet
Upload folder using huggingface_hub
13 days ago
edu-multil-split_arb_Arab_000_00006.parquet
4.4 GB
xet
Upload folder using huggingface_hub
13 days ago
edu-multil-split_arb_Arab_000_00007.parquet
4.39 GB
xet
Upload folder using huggingface_hub
13 days ago