Tessian Data Scientists design the algorithms that are at the heart of what we do. We wouldn’t exist without our machine learning models, and it’s what our clients rely on day-to-day. But what does this mean in practice?
Companies can leverage data science in a number of ways, and we think the role of a Data Scientist falls into three distinct categories:
1. You work for a business function analyzing & reporting on how to improve a key metric; e.g. increasing user conversion.
2. You are responsible for writing production models to enhance the main product; e.g recommendation systems for e-commerce.
3. You build machine learning models, which are the product the company sells.
More specifically we love massive enterprise email datasets. Email doesn’t have the best reputation with engineers – the protocol is ancient, poorly defined, even more poorly secured and email isn’t Slack. As a Data Science team we don’t think of email in terms of SMTP, but rather a beautiful, dynamic and pretty-huge JSON dataset that captures the intricacies of human-to-human communication. Email knows who you communicate with, what you communicate about, what clients you’re pitching, what projects you’ve just completed, who your team members are, your company hierarchy, (excitingly) the list goes on.
But that’s where we come in. Using machine learning and NLP, we build bespoke anomaly detection models to prevent threats executable by humans (rather than code) to secure our client’s communications. We also care deeply about the end user experience of our products, which sounds obvious, but is much more difficult (and in our opinion, important) when machine learning is involved due to its black box nature. This means we spend a lot of time focussing on the explainability of our machine learning predictions back to the end user. For example, why does this email look misdirected? Why does this email you’re receiving look malicious? Notifications with context are more effective than lazy boilerplate warnings (the industry standard).
Another exciting part of being a Data Scientist at Tessian is that we are always thinking about future products we should be building. The great thing about email data is that it’s not “opinionated” about the problem we are trying to solve, meaning we can experiment with solving different problems using the exact same dataset. Sometimes this involves us trying to find signal in the noise, like when we discovered strong-form spear phishing impersonation attacks were getting past existing defenses. Other times it involves trying to solve specific threats and problems brought to us by our clients. The highlight of my week is the Data Science Brainstorming session where we discuss all of our ideas for new products, current product improvements, as well as any new papers/tools that we’ve read about that might help us further our research and models.
And that’s why at Tessian it’s impossible to talk about Data Science without talking about Engineering. To train our machine learning algorithms we need lots of data, and to deploy and run our models in production, we need this data processed with minimal latency. Our Data Science team own the data and what we do with it. But to process, store and scale this data, we call in the Engineering teams to help.
How Data Science and Engineering work together is a much discussed topic and one for which I believe there is no out-of-the-box solution. We’re still figuring it out and tweaking our process, but currently we follow a similar model to Jeff Magnusson’s (Stitch Fix). It’s explained here in Engineers Shouldn’t Write ETL. The platform and Engineering teams leverage their domain knowledge to build and expose data frameworks, empowering our Data Scientists to have end-to-end ownership of their work. This has the added benefit of freeing up Engineering teams to focus on building and scaling our core APIs, rather than maintaining fiddly data pipelines.
We’re a friendly bunch of ambitious Engineers building breakthrough machine learning and natural language technologies to analyze, understand & protect enterprise communication networks.