Python Debugging Q&A
Instruction-following pairs for debugging common Python errors. Ideal for fine-tuning code assistant models.
6,261 examples · JSONL · CC BY-SA 4.0
View on OpenDataBaySynthCodeLab generates high-quality instruction-following and Q&A datasets using state-of-the-art 72B parameter models. Ready to download, ready to train.
How It Works
Every example is generated using Qwen 2.5 72B Instruct running on dedicated GPU hardware, not cheap API calls.
Outputs are deduplicated, length-checked, and rejected if they contain refusals or hallucinations.
JSONL format, instruction-response pairs. Drop into your fine-tuning pipeline with zero preprocessing.
Datasets
Available on OpenDataBay and LabelSets.
Instruction-following pairs for debugging common Python errors. Ideal for fine-tuning code assistant models.
6,261 examples · JSONL · CC BY-SA 4.0
View on OpenDataBayQuestion-answer pairs for identifying and correcting common SQL errors across major dialects.
10,000 examples · JSONL · CC BY-SA 4.0
View on OpenDataBayInstruction-response pairs for reviewing Python code and providing constructive improvement suggestions.
10,000 examples · JSONL · CC BY-SA 4.0
View on OpenDataBayWhy SynthCodeLab
SynthCodeLab is a registered synthetic data provider specializing in code-domain training datasets. Our datasets are generated on dedicated GPU infrastructure running state-of-the-art open-source models, not distilled from proprietary APIs.
We target high-demand, narrow domains where quality and specificity matter. Every dataset ships with a datacard, license, and format spec.
72B
Model Parameters
4× RTX 3090
Generation Hardware
< 2%
Duplication Rate
CC BY-SA 4.0
Default License
Custom Datasets
We take custom generation requests for specific domains, formats, and volume. Get in touch.
Or find us on OpenDataBay and LabelSets.