Oak Ridge National Laboratory data engineer Katie Knight can now add another title to her name: doctor. She’s been a full-time student earning her PhD at the College of Communication Information all while working as a full-time employee at the lab, often applying her knowledge and research from school to work, and vice-versa.
“I’m part of a group that works on tackling data management problems for the research data at the lab, so I think [my research] is pretty salient,” she said. “I work in a technical role, I’m kind of in between research and technical applications. I try to understand what problems research teams may have with data they’re creating, or they’ve inherited. I think about, OK, what are the problems they may have and what are practical solutions?”
Knight began her career as a reference librarian and ended up in her current job after starting out as a metadata librarian at the lab. It was in that position that she taught herself how to code and quickly became interested in metadata and big data, and that’s when she decided it was time to pursue a PhD so she could continue growing her knowledge in information sciences.
Knight’s research began with big data management and has evolved to exploring how to tackle the challenge of using large datasets in scientific research that inevitably retain human and domain perspectives.
While many people may have become aware of data and technology biases that result in harm to people—such as algorithms that produce racist results—Knight’s work is digging into the bias that can occur during data collection due to human assumptions or even just changes caused by new technology or knowledge. This unique approach to thinking about how to provide context for large datasets in order to reflect such biases earned her the Best Doctoral Dissertation award from CCI for 2024.
“There’s always going to be some sort of perspective in data that will be produced, and I think as more and more science goes towards using giant datasets, we need to be aware of how to adjust for that,” she said. “The perspective doesn’t need to be biased in a negative connotation, just that we’re human beings and we automatically filter things out.”
As an example, Knight uses how the pH balance of water was once tested by taking the water from the field to a lab and then litmus paper was used. Now, there are much more precise ways to measure that pH balance in the field and produce a more accurate result. But, if someone is using a huge dataset that includes older data produced with the litmus test, there needs to be adjustments made to account for that.
“The precision of our instruments has changed over time and what we think of as accuracy has also changed. The margin of error and the way different domains decide this is a good method and what is outdated changes,” she said. “This all is something that naturally changes over time but we don’t necessarily talk about how that’s reflected in data.”
The end goal of such research would be to establish a way to communicate those nuances and contexts to a machine that thinks in ones and zeroes, Knight said. While her dissertation doesn’t solve that issue, she did theorize the basis of how to have a model for framing this context.
“I call them unstated assumptions; between your method and the constructs you’re trying to measure, and once you measure them, how do you record them as concepts, and then how does that influence the methods you think are appropriate?” she queried.
Knight said she will continue going down this path of research as part of her position at ORNL, where she has the potential to apply it as she attempts to solve data management issues. When it comes to big data and all the nuances that occur when it is broken down to smaller, local datasets, Knight compares it to gathering doctor’s notes from a million physicians. There is no one standard for taking doctors notes, so if you compiled those notes, it would take a lot of work to turn it into data that is organized and manageable.
“A lot of the data management solutions you see out there assume you’re living in a world with spreadsheets and records and fields, and a lot of the data I see is not that; it is unstructured and not quite so simple as figuring out what your rows and columns are,” she said. “Yes, it’s talking about the qualitative nature of how we impart our perspectives on this, but what I’m interested in is how are we going to scale this? What can we do about it? There may not be anything we can do about it, but we could do something to help people understand how they may adjust or, even if they can’t adjust, provide a context for which the research happened.”
She plans to stay at ORNL for the foreseeable future, so not too much will change in that regard. But Knight is certainly looking forward to being just a full-time employee and removing the “full-time student” from her plate after graduation.
“I love it at ORNL, my colleagues have been so supportive, it’s been just amazing working there. Every single one of my colleagues has been cheering me on and asking me about my research,” she said.