## Weights & Biases ### Weights & Biases ![rw-book-cover](https://readwise-assets.s3.amazonaws.com/static/images/article0.00998d930354.png) #### Metadata * Author: [[wandb.ai]] * Full Title: Weights & Biases * Category: #articles * URL: <https://wandb.ai/wandb_gen/llm-data-processing/reports/Processing-Data-for-Large-Language-Models--VmlldzozMDg4MTM2> #### Highlights * LLMs work so effectively, in part, because of their size: They're trained on immense datasets and thus have a broader understanding than smaller models trained on smaller datasets. * But because it is so expensive to perform manual review and curation on massive datasets, many of these datasets have quality issues. This has implications far beyond metrics like perplexity and validation loss, as learned models reflect the biases present in their training data. * As data is the fuel driving growth for these LLMs, it is crucial to understand and document the composition of the datasets used to train large language models. # Weights & Biases ![rw-book-cover](https://readwise-assets.s3.amazonaws.com/static/images/article0.00998d930354.png) ## Metadata - Author: [[wandb.ai]] - Full Title: Weights & Biases - Category: #articles - URL: https://wandb.ai/wandb_gen/llm-data-processing/reports/Processing-Data-for-Large-Language-Models--VmlldzozMDg4MTM2 ## Highlights - LLMs work so effectively, in part, because of their size: They're trained on immense datasets and thus have a broader understanding than smaller models trained on smaller datasets. - But because it is so expensive to perform manual review and curation on massive datasets, many of these datasets have quality issues. This has implications far beyond metrics like perplexity and validation loss, as learned models reflect the biases present in their training data. - As data is the fuel driving growth for these LLMs, it is crucial to understand and document the composition of the datasets used to train large language models.