We evaluated the performance of Fitbit Charge 3™ (FC3), a multi-sensor commercial sleep-tracker, for measuring sleep in adolescents against gold-standard laboratory polysomnography (PSG). Single-night PSG and FC3 sleep outcomes were compared in thirty-nine adolescents (22 girls; 16-19 years), 12 of whom presented with clinical/subclinical DSM-5 insomnia symptoms (7 girls). Discrepancy analysis, Bland-Altman plots, and epoch-by-epoch analyses were used to evaluate FC3 performance. The influence of several factors potentially affecting FC3 performance (e.g., sex, age, body mass index, firmware version, and magnitude of heart rate changes between consecutive PSG epochs) was also tested. In the sample of healthy adolescents, FC3 systematically underestimated PSG total sleep time by about 11 min and sleep efficiency by 2.5%, and overestimated wake after sleep onset by 9 min. Proportional biases were detected for “light” and “deep” sleep duration, resulting in significant underestimation of these parameters for those participants having longer PSG N1+ N2 and N3 durations, respectively. No significant systematic bias was detected for sleep efficiency and sleep onset latency. Epoch-by-epoch analysis showed sleep-stage sensitivity (average proportion of PSG epochs correctly classified by the device for a given sleep stage) of 68% for wake, 78% for “light” sleep, 59% for “deep” sleep, and 69% for rapid eye movement (REM) sleep in healthy sleepers. Similar results were found in the sample of adolescents with insomnia symptoms. Body mass index was positively associated with FC3-PSG discrepancies in wake after sleep onset (R = .16, = .048). The magnitude of the heart rate acceleration/deceleration between consecutive PSG epochs was an important factor affecting FC3 classifications of sleep stages. Our results are in line with a general trend in the literature, suggesting better performance for the recently introduced multi-sensor devices compared to motion-only devices, although further developments are needed to improve accuracy in sleep stage classification and wake detection. Further insight is needed to determine factors potentially affecting device performance, such as accuracy and reliability (consistency of performance over time), in different samples and conditions.