Speech Command Recognition (SCR) is rapidly gaining prominence due to its diverse applications, such as virtual assistants, smart homes, hands-free navigation, and voice-controlled industrial machinery. In this paper, we present a data-centric approach to creating SCR systems for low-resource languages, particularly focusing on the Kazakh language. By leveraging synthetic data generated by Text-to-Speech (TTS) and data extracted from a large-scale speech corpus, we successfully created the Kazakh language equivalent of the Google Speech Commands dataset. Moreover, we also compiled the Kazakh Speech Commands dataset with data collected from 119 par-ticipants. This dataset was used to benchmark the performance of the Keyword-MLP model trained using our synthetic dataset. The results showed that the model achieves 89.79 % accuracy for the real-world data demonstrating the efficacy of our approach. Our work can serve as a recipe for creating customized speech command datasets, including for low-resource languages, obvi-ating the need for laborious and costly human data collection.