Deep learning-based approaches have achieved remarkable success in various image-based dietary assessment applications, including food detection and estimating portion sizes. However, most existing methods address these tasks separately, limiting their practical applicability in real-world scenarios where simultaneous processing is required. This study presents an end-to-end multi-task method that integrates food detection and portion size estimation within a single model, leveraging the state-of-the-art YOLOv12 architecture. To support this approach, this paper introduces the Food Portion Benchmark (FPB) dataset – a comprehensive and diverse collection comprising 14,083 images across 138 food classes, each with manually annotated bounding boxes and laboratory-measured component weights. To promote reproducibility and standardization, a public Food AI leaderboard has been deployed on Hugging Face, enabling researchers to benchmark their models for detection and weight estimation on the FPB dataset. In addition, this work provides pre-trained model weights and model-generated annotations, eliminating the need for researchers to perform their own resource-intensive measurements and annotations of food items. The proposed method achieves state-of-the-art performance, with a mean average precision (mAP50) of 0.978 for food detection and a mean absolute error (MAE) of only 90.95 grams for portion size estimation on the FPB test set. These results highlight the effectiveness and practicality of an integrated model for accurate and comprehensive dietary analysis. It is anticipated that this FPB dataset and its model will serve as a foundational reference for dietary assessment, particularly for culturally relevant foods in Central Asia.