Data Science At Tessian Is All About Passion And Curiosity

Written by Ed Bishop, Co-founder & CTO

Tessian Data Scientists design the algorithms that are at the heart of what we do. We wouldn’t exist without our machine learning models, and it’s what our clients rely on day-to-day. But what does this mean in practice?

We believe Data Scientists fall into three valuable, but very different roles:

1. You work for a business function analyzing & reporting on how to improve a key operating metric, e.g increasing user website traffic conversion

2. You are responsible for writing production models to enhance the main product, e.g recommendation systems for e-commerce

3. You are the product

First, we love email.

More specifically we love massive enterprise email datasets. Email doesn't have the best reputation with engineers - the protocol is ancient, poorly defined, even more poorly secured and email isn’t Slack. As a Data Science team we don’t think of email in terms of SMTP, but rather a beautiful, dynamic and pretty-huge JSON dataset that captures the intricacies of human-to-human communication. Email knows who you communicate with, what you communicate about, what clients you’re pitching, what projects you’ve just completed, who your team members are, your company hierarchy, (excitingly) the list goes on.

“Rule-based security systems are ineffective at detecting advanced threats on email. This is because the most advanced threats are either caused by - or exploit human relationships, which by their nature are dynamic and constantly in flux. This is a very interesting challenge for us in the Data Science team, one that requires using advanced NLP and training models to detect deviations from a user’s normal behavior.” - AMINE SALEM

Of course, all this information is hidden, messy and unstructured.

But that’s where we come in. Using machine learning and NLP, we build bespoke anomaly detection models to prevent threats executable by humans (rather than code) to secure our client’s communications. We also care deeply about the end user experience of our products, which sounds obvious, but is much more difficult (and in our opinion, important) when machine learning is involved due to its black box nature. This means we spend a lot of time focussing on the explainability of our machine learning predictions back to the end user. For example, why does this email look misdirected? Why does this email you’re receiving look malicious? Notifications with context are more effective than lazy boilerplate warnings (the industry standard).

Another exciting part of being a Data Scientist at Tessian is that we are always thinking about future products we should be building. The great thing about email data is that it’s not “opinionated” about the problem we are trying to solve, meaning we can experiment with solving different problems using the exact same dataset. Sometimes this involves us trying to find signal in the noise, like when we discovered strong-form spear phishing impersonation attacks were getting past existing defenses (leading to the creation of Tessian Defender). Other times it involves trying to solve specific threats and problems brought to us by our clients. The highlight of my week is the Data Science Brainstorming session where we discuss all of our ideas for new products, current product improvements, as well as any new papers/tools that we’ve read about that might help us further our research and models.

One thing I’ve touched on a lot, but not specifically discussed is data.

And that’s why at Tessian it’s impossible to talk about Data Science without talking about Engineering. To train our machine learning algorithms we need lots of data, and to deploy and run our models in production, we need this data processed with minimal latency. Our Data Science team own the data and what we do with it. But to process, store and scale this data, we call in the Engineering teams to help.

How Data Science and Engineering work together is a much discussed topic and one for which I believe there is no out-of-the-box solution. We’re still figuring it out and tweaking our process, but currently we follow a similar model to Jeff Magnusson’s (Stitch Fix). It's explained here in Engineers Shouldn’t Write ETL. The platform and Engineering teams leverage their domain knowledge to build and expose data frameworks, empowering our Data Scientists to have end-to-end ownership of their work. This has the added benefit of freeing up Engineering teams to focus on building and scaling our core APIs, rather than maintaining fiddly data pipelines.

If any of the above seems interesting to you, we’d love for you to check out some of our open roles across Data Science, Engineering & Product. We’re a friendly bunch of ambitious Engineers building breakthrough machine learning and natural language technologies to analyze, understand & protect enterprise communication networks.

Tessian Stack Overflow