Human-AI mixed reality systems are becoming ubiquitous from day-to-day intelligent personal assistants to specialized, high stakes AI-assisted decision making tools. While designed to improve the user experience, current systems lack the ability to adapt to changing user needs, imparting increased cognitive burden on users. Understanding and minimizing cognitive effort through adaptive design is critical to paving the way for widespread adoption of mixed reality technology. Recent approaches towards adaptive interfaces involve monitoring physiological signals through obtrusive and extraneous equipment. As eye-trackers are becoming more readily available, pupillometry provides an unobtrusive pathway to cognitive load estimation. However, assessment of cognitive load from pupillometry is challenging in unconstrained environments due confounding effects of the pupillary light reflex. Our approach aggregates human- and world-facing sensory data to disentangle the cognitive load induced pupil response from the pupillary light reflex. First, we design a user study to generate a dataset observing pupil diameter changes during an auditory N-back recall task under variable environmental conditions. Next we develop a data processing pipeline to address outliers and temporally align all target variables. Finally, we design a 3D-CNN architecture to achieve a proof-of-principle cognitive load estimation model.