Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

Datasets:
mimir-lcm
/
fineweb-2-sentence-split

Modalities:
Text
Formats:
parquet
Languages:
Arabic
Belarusian
Bulgarian
Size:
100M - 1B
ArXiv:
Libraries:
Datasets
Dask
Polars
License:
Dataset card Data Studio Files Files and versions
xet
Community
1
fineweb-2-sentence-split / arb_Arab
37.6 GB
  • 1 contributor
History: 1 commit
m-elio's picture
m-elio
Upload folder using huggingface_hub
4c991ba verified 13 days ago
  • edu-multil-split_arb_Arab_000_00000.parquet
    4.94 GB
    xet
    Upload folder using huggingface_hub 13 days ago
  • edu-multil-split_arb_Arab_000_00001.parquet
    4.9 GB
    xet
    Upload folder using huggingface_hub 13 days ago
  • edu-multil-split_arb_Arab_000_00002.parquet
    4.87 GB
    xet
    Upload folder using huggingface_hub 13 days ago
  • edu-multil-split_arb_Arab_000_00003.parquet
    4.85 GB
    xet
    Upload folder using huggingface_hub 13 days ago
  • edu-multil-split_arb_Arab_000_00004.parquet
    4.82 GB
    xet
    Upload folder using huggingface_hub 13 days ago
  • edu-multil-split_arb_Arab_000_00005.parquet
    4.4 GB
    xet
    Upload folder using huggingface_hub 13 days ago
  • edu-multil-split_arb_Arab_000_00006.parquet
    4.4 GB
    xet
    Upload folder using huggingface_hub 13 days ago
  • edu-multil-split_arb_Arab_000_00007.parquet
    4.39 GB
    xet
    Upload folder using huggingface_hub 13 days ago