Request a Demo of Tessian Today.
Automatically stop data breaches and security threats caused by employees on email. Powered by machine learning, Tessian detects anomalies in real-time, integrating seamlessly with your email environment within minutes and starting protection in a day. Provides you with unparalleled visibility into human security risks to remediate threats and ensure compliance.

October 27 | Fwd:Thinking. The Intelligent Security Summit (Powered by Tessian). Save Your Seat →

Life at Tessian

Learn more about Tessian company news, events, and culture directly from different teams. Hear from engineering, product, customer success, and more.

Life at Tessian
Tessians 2022 DEI Report
By Tessian
28 June 2022
As a human first company, we want Tessian to be a place where everyone has the opportunity to bring who they are to work, and be included and valued as they are. Diversity, equity and inclusion (DEI) is so important to us, not only because it’s the right thing to do, but also because it’s essential for our success. Diversity is necessary for innovation, so prioritizing it is a really important part of our future as a company.   We recently published our second annual DEI Report, and I’ve been reflecting on our journey over the last year and the three big lessons I’ve taken into this year’s strategy.
Data. Data. Data.   We can’t just guess how we’re doing on DEI, we need data. When we first launched our 2021 DEI Strategy, it was based on analysis of a number of different kinds of data that helped act as signposts towards our DEI Focus Areas. Since then, we have improved our data set to add anonymized candidate data, and employee data about lots more personal attributes.   Anything we can explore – we do. It can be difficult to know where you’re going to find the most interesting and impactful insights before you start looking. Here’s how we do it:   We start off with a big pile of data, everything from representation to experience, to compensation to retention, all split by all the different personal attributes we collect voluntary data on. There are some standard measures we look at: pay gaps, representation vs benchmarks, significant variations in experience etc. but that often opens the door to lots of further questions, that require further data exploration We do our best to turn over every single stone and ask ourselves: is something going on here? Usually the answer is no, but it’s important that we employ that rigour everywhere, so that when the answer is yes, we don’t miss it. It’s easy to get distracted by what we assume the most significant DEI concerns are, often based on our own biases, so it’s so key to start as objectively as possible. Don’t guess or intuit where you should be focusing attention! Start with as much data as you can get, and let that guide your thinking.
If you don’t actively pay attention, anything can slip   Focus is necessary, but it’s hard. Throughout this journey, we’ve been so conscious that there are infinite dimensions of diversity to consider, and infinite topics we could focus our attention on. But resources are finite, and if we want to make an impact, we need to focus on just a few things.   As hard as it feels, focus isn’t just about deciding where you are going to focus, it’s also about deciding where you’re not going to dedicate energy. In 2021 one of those “non-focus areas” for us was gender representation. We found that we were above the benchmark compared to other companies similar to us, and there was nothing to indicate that might drop. So we put our energy into other places.   Throughout 2021, our gender representation gradually fell by 7 percentage points as we happened to hire fewer women and people from underrepresented genders. By the time the end of the year came, these few percentage points had put us below the benchmark compared to other similar companies.   Focusing on other kinds of representation, and other DEI areas meant we didn’t notice this gradual change in our gender representation, and so didn’t get ahead of it. This was a really important lesson for us this year; this time around we are paying more attention to movement in metrics even when they don’t directly relate to our focus areas for the year.   This is key to keeping focus dynamic, and adapting to the information you have today.
Working with everyone, necessity of the team activity   The final lesson I’ve taken from our DEI journey so far: DEI is necessarily a team activity. None of us can do it alone.   Once we have our focus areas, we develop tactics that we hope will address them. So far on our journey, the accountability to these tactics has been with the People & Talent team. But the more work we do, the more we realize we need the whole company 100% behind us, prioritizing this work.   Hiring is a great example of this: in a fast growing business, often representation comes down to hiring. If you’re growing but you aren’t hiring diversely, then overall representation will fall. So one of our Focus Areas this year is hiring more people from underrepresented genders and ethnicity backgrounds.   Of course, our brilliant Talent partners care so deeply about this, and are moving heaven and earth to build up a diverse pipeline of candidates. But it isn’t always easy. Building a diverse pipeline in a notoriously non-diverse industry can take time, and this is often time we feel we don’t have in such a fast-moving company. Or there might be a particular experience level we feel like a candidate should possess that limits the diversity in the candidate pool.   This is where the rest of the company comes in. In this case: the Hiring Manager and hiring team. Every single Tessian needs to be bought into our strategy so that we can resolve these challenges in the right way. One of our Tessian values is We Do The Right Thing, so it’s really important to us to take these tensions seriously and work together to make the best decisions for our people.   There are a few basic things we ask of all Tessians…   Help us reach diverse candidates by sharing our DEI work and our open roles widely…think LinkedIn, Discord, Slack. Any communities our Tessians are a part of! Continue to give us feedback on how they’re feeling, about DEI and our workplace more generally. We use an employee engagement tool, Peakon to collect this feedback so that people can stay anonymous if they choose. And most importantly: Get to know each other! Connection building is the core of belonging so we encourage lots of ways for our people to connect deeply. This is especially important in a globally distributed, hybrid team – we have to OVER deliver on opportunities to get together both in person and virtually. What’s Next?   And as with any journey like this, it’s far from over. We all have so much work to do in DEI and there are a hundred new questions swimming around our heads on where we should focus next, and how to make our DEI Strategy more effective. For example…   Goals: Right now our DEI Goals sit with the People team. Should we transition our DEI Goals to the company level, so it’s every one of us that is responsible for addressing them? We know accountability is key, but is the accountability in the right place for maximum impact?   Engagement: How much time and engagement should we be asking of our people? Do we need everyone to know every detail of our strategy? Or is it enough that they know their own role, and the WHY behind DEI at Tessian?   We’re committing to continuing to ask ourselves these hard questions and hold ourselves accountable to the very highest standards of DEI. It’s not always easy, but it is the right thing to do.   Want to join us on our journey? We’re hiring, all open roles are here. What’s it like to work at Tessian? Here’s 200 reasons you’ll love it.
Life at Tessian
Welcoming Our New Chief People Officer
By Andrew Webb
14 June 2022
We are welcoming Kelly Sheridan as Tessian’s new Chief People Officer! Kelly will be responsible for leading Tessian’s people strategy, with a key focus on attracting and growing talent, developing and evolving the company’s culture, and providing a great employee experience as the company grows and scales.    We sat down with Kelly to ask her a few questions and get to know her a little better.
Kelly, first thing first, how did you get into the world of HR?    So, my path to Chief People Officer is certainly not the traditional route. I graduated with a liberal arts degree from Syracuse University but I didn’t really have a “what I wanted to be when I grow up”  moment. I knew I wanted to move to Boston, and it was there that I found myself landing a career in marketing. Over 12 years, I worked my way up at a variety of financial services companies and, in 2005, I joined the largest regional accounting firm in New England as their head of marketing. I loved every minute of the marketing stage of my career.   About a year after that, a new CEO came in and said he wanted to do some restructuring. He asked me to take over HR. I had zero experience, zero knowledge and, I thought, zero interest in HR. But he was certain it was where I needed to be and he promised me support, training, consultants, etc.    Here I am, 17 years later, as a Chief People Officer. Needless to say he was right; HR was my calling. He delivered on his promises and I still consider him a friend and mentor.
That’s an amazing story. So, what happened next?    The accounting firm was acquired by Grant Thornton and, as a result, HR was centralized in Chicago. So, in 2013 I left to pursue my next role as VP, Global HR at SharkNinja – a consumer goods brand which makes Shark Vacuums and Ninja Blenders. I had the chance to help grow both the People function and the global footprint, which saw me opening a design center in London and relocating to China for five months.    I later joined Bullhorn, the global leader in software for the recruitment industry, as its VP People. While I loved that role, I knew I wanted to take a step into a Chief People Officer (CPO) role and build a function from the ground up, and this is what I did at Nuvolo.   The last two and a half years have been a ride!  We grew our employee headcount from 250 to over 500, hiring 285 people globally in 10 months in 2021 all while building all of the processes, programs, and policies that go along with scaling a fast-paced tech organization.
Sounds like your experience in growing and scaling teams in fast-paced tech companies is perfectly suited for the Chief People Officer role at Tessian. So what made you decide to join our company?    There are a few reasons but I think the single most compelling was the people I met – starting with Tim, the CEO. Every conversation I had during the hiring process felt genuine, authentic, and easy.. Everyone was caring, and I could really get a sense of the energy and passion behind the work the people at Tessian do. Everyone is excited about what the future holds.    With that in mind, it’s clear that the culture at Tessian is a really strong one. I’m excited to join an organization that has already built something special already, and I also see limitless opportunities ahead.
What do you see as the biggest opportunities for Tessian?    For me, it’s about building an incredible employee experience. There is no doubt that exists here; I’ve seen it throughout the interview and onboarding process. But as we grow and scale, there will be further opportunities to evolve and innovate so that we are providing programming, initiatives, coaching, learning, and experiences that help every employee at Tessian expand their careers, the business, and our brand.    We’re so happy to have you onboard Kelly. Now you’re here, what’s going to be your focus for the next 3-6 months?   I actually look at this in smaller blocks. My first 90 days will be about meeting people and trying to learn as much as I can about Tessian, the market and our customers. Through listening and learning, I aim to find where there is room for improvement, and how we can enhance the employee experience and our business strategy.    Then, it’s about how we translate business objectives into our People strategy so that we are attracting, developing and keeping our exceptional team!
Life at Tessian
Tessian Named One of the 2022 UK’s Best Workplaces™ for Wellbeing
By Tessian
17 February 2022
We’re excited to announce that Tessian has been named one of the 2022 UK’s Best Workplaces™ for Wellbeing.   So, what was the criteria? The Great Place to Work® culture experts analyzed thousands of employee surveys, assessing their holistic experiences of wellbeing at work through fundamental facets of employee wellbeing, like:   Work-life balance Sense of fulfilment Job satisfaction Psychological safety Financial security Here are just some examples of how we support wellbeing at Tessian.   Refreshian Summer Last year, we launched what we call “Refreshian Summer”. This sees everyone in the company down tools on specific days through-out the summer months to rest, recharge and enjoy the sunshine. You can learn more about the initiative here.    Mental health support We wouldn’t have made it on this list if we didn’t take mental health seriously. And private healthcare in both the US and the UK and instant access to support via Spill help us take care of our people. With Spill, therapy sessions, mental health training, and feelings check-ins are just a click away.    Access to Employee Resource Groups (ERGs)   It’s important employees feel like they can bring their whole selves to work, and ERGs, or staff community groups, help ensure all Tessians can connect with their peers in a safe space. Two examples? Plus, an LGBTQ+ network, and Tes-She-An, a space created to support Tessians who identify as women.    In-depth Diversity and Inclusion (D&I) training   Over the last 18 months, Tessian has been taking steps towards creating a more diverse and inclusive place to work. Why? Well, it’s the right thing to do. Diversity is infinite and everyone should feel valued for who they are and have the opportunity to bring this to work.    Hear why D&I is important to all of us below, and read more about how we created a strategy to maximize impact here.    “Choice First” working policy   Tessian’s ‘choice first’ working policy allows employees to choose where they work – remotely, in the office, or hybrid. Those working remotely get substantial budgets to set up their home offices, and those working in the office or hybrid have access to hubs in London, Boston, Austin, and San Francisco. 
Want to work at Tessian? See if we have a role that interests you today.
Engineering Team
Why Confidence Matters: Experimental Design
By Cassie Quek
19 January 2022
This post is part three of Why Confidence Matters, a series about how we improved Defender’s confidence score to unlock a number of important features. You can read part one here and part two here.   Bringing our series to a close, we explore the technical design of our research pipeline that enabled our Data Scientists to iterate over models with speed. We aim to provide an insight into how we solved issues pertaining to our particular dataset, and conclude with how this project had an impact for our customers and product.   Why design a pipeline?   Many people think that a Data Scientist’s job is like a Kaggle competition – you throw some data at a model, get the highest scores, and boom, you’re done! In reality, building a product such as Tessian Defender was never going to be a one-off job. The challenges of making a useful machine learning (ML) model in production lies not only in its predictive powers, but also in its speed of iteration, reproducibility, and ease of future improvements.   At Tessian, our Data Scientists oversee the project end-to-end, from conception and design, all the way through to deployment in production, monitoring, and maintenance. Hence, our team started by outlining the above longer-term requirements, then sat down together with our Engineers to design a research flow that would fulfill these objectives.   Here’s how we achieved the requirements we set out.
The research pipeline
The diagram above shows the design of the pipeline with its individual steps, starting from the top left. An overall configuration file specifies many parameters for the pipeline, such as the date range for the email data we’ll be using and the features we’ll compute. The research pipeline is then run on Amazon Sagemaker, and takes care of everything from ingesting the checked email data from S3 (Collect Logs step) to training and evaluating the model (at the bottom of the diagram).   Because the pipeline is split into independent and configurable “steps”, each storing its output before the next picks it up, we were able to iterate quickly. This provided flexibility to configure and re-run from any step without having to re-run all the previous steps, which allowed for experimentation at speed.    In our experience, we had only to revise the slowest data collection and processing steps a couple of times to get it right (steps 1-3), and most work and improvements involved experimenting with the features and model training steps (steps 4-5). The later research steps take only a few minutes to run as opposed to hours for the earlier steps, and allow us to test features and obtain answers about them quickly.
Five Key Steps within the Pipeline   Some of these will be familiar to any Data Science practitioner. We’ll leave out general descriptions of these well-known ML steps, and instead focus on the specific adjustments we made to ensure the confidence model worked well for the product.   1. Collect Logs This step collects all email logs with user responses from S3 and transforms them to a format suitable for later use, stored separately per customer. These logs contain information on decisions made by Tessian Defender, using data available at the time of the check. We also lookup and store additional information to enrich and add context to the dataset at this stage.   2. Split Data The way we choose to create the training and test datasets is very important to the model outcome. As mentioned before, consistency in model performance across different cuts of the data is a major concern and success criterion.     In designing our cross-validation strategy, we utilized both time-period hold-outs and a tenants hold-out. The time-period hold-out allows us to confirm that the model generalizes well across time even as the threat landscape changes, while testing on a tenant hold-out ensures the model generalises well across all our customers, that are spread across industries and geographical regions. Having this consistency means that we can confidently onboard new tenants and maintain a similar predictive power of Tessian Defender on their email traffic.   However, the downside to having multiple hold-outs is that we’re effectively throwing out data that did not fit within both constraints for each dataset, as demonstrated in the chart below.
We eventually compromised by allowing a slight overlap between train and validation tenants (but not on test tenants), minimizing the data discarded where possible.   3. Labels Aggregation In part two, we also highlighted that one of the challenges of the user-response dataset is mislabelled data. Greymail and spam are often wrongly labeled as phishing, and can cause the undesired effect of the model prioritizing spam, making the confidence score less meaningful for admins. Users also often disagree on whether the same email is safe or malicious. This step takes care of these concerns by cleaning out spam and aggregating the labels.   In order to assess the quality of user-feedback, we first estimated the degree of agreement between user-labels and security expert labels using a sample of emails, and found that user-labels and expert-labels matched in around 85% of cases. We addressed the most systematic bias observed in this exercise by developing a few simple heuristics to correct cases where users reported spam emails as malicious.    Where we have different labels for copies of the same email sent to multiple users, we applied an aggregation formula to derive a final label for the group. This formula is configurable, and carefully assessed to provide the most accurate labels.   4. Features This step is where most of the research took place – trialing new feature ideas and iterating on them based on feature analysis and metrics from the final step.    The feature computation actually consisted of two independently configurable steps: one for batch features and another for individually computed features. The features consisted of some natural language processing (NLP) vectorizations which were computed faster as a batch, and were more or less static after initial configurations. Splitting it out simplified the structure and maximized our flexibility.    Other features based on stateful values (dependent on the time of the check) such as domain reputations and information from external datasets were computed or extracted individually, such as whether any of the URL domains in the email was registered recently.   5. Model Training and Evaluation In the final and arguably most exciting step of the pipeline, the model is created and evaluated.    Here, we configure the model type and its various hyperparameters before training the model. Then, based on the validation data, the “bucket” thresholds are defined. As mentioned in part two, we defined five confidence buckets that simplified communication and understanding with users and stakeholders. These buckets range in priority from Very Low to Very High. In addition, this step produces the key metrics we’ll use to compare the models. These metrics include both generic ML metrics and Tessian Defender product-specific metrics as mentioned in part two, against each of the data splits.    Using MLFLow, we can keep track of the results of our experiments neatly, logging the hyperparameters, metrics, and even store certain artifacts that would be relevant in case we needed to reproduce the model. The interface allowed us to easily compare models based on their metrics.    Our team held a review meeting weekly to discuss the things we’ve tried and the metrics it has produced before agreeing on next steps and experiments to try. We found this practice very effective as the Data Science team rallied together to meet a deadline each week, and product managers could easily keep track of the project’s progress. During this process, we also kept in close contact with several beta users to gather quick feedback on the work-in-progress models, ensuring that the product was being developed with their needs in mind.
The improved confidence score  The new priority model was only deployed when we hit the success criteria we set out to meet.    As set out in part two, besides the many metrics such as AUC-ROC we tracked internally in order to give us direction and compare the many models, our main goal was always to optimize the users’ experience. That meant that the success criteria depended on product-centric metrics: the precision and number of quarantined emails for a client, the rate at which we could improve overall warning precision, and consistency of performance across different slices of data (time, tenants, threat types).   Based on the unseen test data, we observed a more-than-double improvement in the precision of our highest priority bucket, with our newest priority model. This improved the user experience of Tessian Defender greatly, as it meant that a security admin could now find malicious emails more easily and act on it more quickly, and that quarantining emails without compromising on users’ workflow was a possibility.
Product Impact As a Data Scientist working on a live app like Tessian Defender, rolling out a new model is always the most exciting part of the process. We get to observe the product impact of the model instantly, and get feedback through the monitoring devices we have in place, or by speaking directly with Defender customers.   As a result of the improved precision in the highest priority bucket, we unlocked the ability to quarantine with confidence. We are assured that the model is able to quarantine a significant number of threats (for all clients), massively reducing risk exposure for the company, and saving employees precious time and the burden and responsibility of discerning malicious mails, at a low rate of false positives.    We also understand that not all false positives are equal – for example, accidentally quarantining a safe newsletter has almost zero impact compared to quarantining an urgent legal document that requires immediate attention. Therefore, prior to roll-out, our team also made inquiries to quantify this inconvenience factor, ensuring that the risk of quarantining a highly important, time-sensitive email was highly unlikely. All of this meant that the benefit of turning on auto-quarantine and protecting the user from a threat far outweighs the risk of interrupting the user’s work-flow and any vital business operations. 
With this new model, Tessian Defender-triggered events are also being sorted more effectively.    Admins who log in to the Tessian portal will find the most likely malicious threats at the top, allowing them to act upon the threats instantly. Admins can quickly review the suspicious elements highlighted by Tessian Defender and gain valuable insights about the email such as: its origin  how often the sender has communicated with the organization’s users how users have responded to the warning    They can then take action such as removing the email from all users’ inboxes, or adding the sender to a denylist. Thus, even in a small team, security administrators are able to effectively respond to external threats, even in the face of a large number of malicious mails, all the while continuing to educate users in the moment on any phishy-looking emails.
Lastly, with the more robust confidence model, we are able to improve the accuracy of our warnings. By ensuring a high warning precision overall, users pay attention to every individual suspicious event, reap the full benefits of the in-situ training, and are more likely to pause and evaluate the trustworthiness of the email. As the improved confidence model is able to provide a more reliable estimate on the likelihood of an email being malicious, we are able to cut back on warning on less phishy emails that a user would learn little out of.   This concludes our 3-part series on Why Confidence Matters. Thank you for reading! We hope that this series has given you some insight into how we work here at Tessian, and the types of problems we try to solve.  To us, software and feature development is more than just endless coding and optimizing metrics in vain – we want to develop products that will actually solve peoples’ problems. If this work sounds interesting to you, we’d love for like-minded Data Scientists and Developers to join us on our mission to secure the Human Layer! Check out our open roles and apply today.   (Co-authored by Gabriel Goulet-Langlois and Cassie Quek)
Engineering Team Life at Tessian
Engineering Spotlight: Meet Our 2021 Cohort of Associate Engineers
17 January 2022
We’ve believed for a long time that without finding ways to bring new talent into our industry, we’ll never overcome the lack of diversity in tech. But this only works if you can bring in diverse groups of people to begin with.
So, how did we aim to tackle this? Last year, as part of our Diversity, Equity, and Inclusion (DEO) roadmap, we kicked off a recruitment process for five new, entry level Associate Engineer positions. To widen the pool of talent, we removed some of the historical prerequisites you often see like  ‘Must have a degree in Computer Science’, and instead added ‘code-campers and career-changers welcome’ to encourage more potential great engineers to seize the opportunity.    There process represented a couple of firsts for us:    This was the first time mass recruiting and onboarding 5 candidates into the same role  We reviewed over 900 applicants, took over 300 through to the first stage, and one-on-one interviewed 53 candidates over the course of 3 weeks.Talk about Craft at Speed.   We had the opportunity to connect with so many awesome engineers and are really excited to introduce you to the 5 Tessians who officially joined us at the end of 2021.    As you’d expect, every person has a different story to tell… Meet the team: 
Nash   Nash has not one but two degrees under his belt. First he achieved a PhD in Cinema History before going on to get his MSc in Computing at Cardiff University. If that wasn’t enough, before that,  he spent two years teaching English in Japan.    Why Tessian?    “The role was much too attractive not to apply to! From the statements about the work culture, to the blogs and podcasts about the company and its mission, to the clear and impactful use cases of the product, it felt like an incredible place to start a new career. I especially loved the ‘Engineering at Tessian’ YouTube video – it really helped clarify what to expect from life in the company as a part of the engineering team.”   What’s the coolest thing you’ve done in your first month?    “While there have been lots of great moments, from Fernet Fridays to team lunches to the thoughtful and well-paced onboarding week, I would say my highlight was the first WIG (weekly interdepartmental gathering) meeting. It was great to share a room – both physically and virtually – with the whole company, to introduce myself, and hear everyone’s fun facts about themselves. I really felt like a part of the Tessian community.”
Dhruv Dhruv moved to the UK from New Delhi, India to complete his Computer Science degree at University of Manchester before moving to London to join Tessian. Although he enjoyed his time in Manchester, he loves exploring the parks and restaurants of London, as well as catching some live cricket action.    Why Tessian?    “Two things. One, because of the unique products they offer and the cutting edge technology that goes behind building them. I have a keen interest in Software Development, Machine Learning, and Natural Language Processing. Tessian effectively uses these technologies to make emails safer! And two, I feel aligned to the values and the some of the benefits stood out – Refreshian Summer, Taste of Tessian (lunch paid for every Friday), Private healthcare, and ClassPass among other things.”   What’s the coolest thing you’ve done in your first month?    “Definitely the WIG . The most fun and terrifying thing so far was introducing myself in front of the whole company and telling everyone a fun fact about myself. My fun fact was that I partly decided to go to the University of Manchester because I support Manchester United. To avoid spending all my money on tickets, I started working as a steward in the Theatre of Dreams and got paid to watch the games! This was an awesome experience that really helped me build my confidence and I got to hear some really funny stories about my colleagues.”
Rahul Rahul is currently commuting from Essex to our office in Liverpool Street. Before this, he achieved an Engineering (Information and Computer Engineering) degree at University of Cambridge.    Why Tessian?    “After connecting with Tessian, I very quickly became interested in the products and realized how essential email security really is. I’m glad I applied. From start to finish, it was probably the fastest and most efficient of the companies I applied to. Everyone was very friendly and it made me even more eager to join the team.”   What’s the coolest thing you’ve done in your first month?    “At the end of the first week, we had an Engineers social at the office. It was also the last Friday of Refreshian Summer, so the social started at lunch with pizza and drinks. Time flew by and the social went well into the evening. It was a chance to get to know a lot more people in a very relaxed way.  
Claire ​​ Not only has Claire moved countries (from Colorado to London) but she’s also made a career change. Talk about big moves! Before coming to Tessian, Claire was a project manager at a construction firm. Although she’s now switched to a more technical role, if you ever need advice on how much your house foundation will cost or if your plumber is indeed making fun of you behind your back, she’s got your back.    Why Tessian?    “I was looking for a career change.My goal was to become a software engineer and I’m particularly interested in cybersecurity and data privacy. I had to move here for the role and I came to London not knowing anyone, so it’s been great to enjoy spending time with coworkers on and off the clock. (Another plus: I’ve become a big fan of the pint and pie deal at my local pub.)”    What’s the coolest thing you’ve done in your first month?    “I’m looking forward to continuing learning in a supportive environment. My manager says  “ We create an environment where people feel supported to tackle hard projects” and I feel like that couldn’t be truer. I can’t emphazise enough that working here is truly amazing. I am also incredibly excited to connect with other women in STEM and want to become more involved in Tessian’s empowering culture!  Want to get a better idea of what Claire is working on? Check out her Day in the Life post here.
Nicholas From Switzerland to the UK, Nicholas studied Computer Science, and earned his BsC at Exeter University before completing his Masters degree at St. Andrews. From Scotland, he has now joined us in London.    Why Tessian?    “I came for the tech, and stayed for the product. When I applied, I was already pretty familiar with the languages, tools, and platforms Tessian uses. I hadn’t given email security much thought, though. But when I started to look into exactly what Tessian did, I gradually became a lot more interested in what they were building. I’ve seen misdirected emails and spear phishing attempts, and I liked what they were doing to prevent it.”   What’s the coolest thing you’ve done in your first month?    “Shortly after onboarding I got to start making changes and additions to our product. These changes were then swiftly deployed to our customers, and it was nice to see how quickly I could start working with the team to make a better product. Our team just released a new product, Architect. I look forward to working on it and making it into the best damn email filtering tool out there. Also I’m enjoying spending time with July, Claire’s dog which hangs out in the office.”
Great news! After a successful cohort in 2021, we have another five entry level positions available to join us this year. Plus we have plenty more opportunities for you to join Tessian, in Engineering, and our other teams.  Apply now. 
Engineering Team ATO/BEC Life at Tessian
Why Confidence Matters: How Good is Tessian Defender’s Scoring Model?
10 January 2022
This post is part two of Why Confidence Matters, a series about how we improved Defender’s confidence score to unlock a number of important features. You can read part one here.   In this part, we will focus on how we measured the quality of confidence scores generated by Tessian Defender. As we’ll explain later, a key consideration when deciding on metrics and setting objectives for our research was a strong focus on product outcomes.   Part 2.1 – Confidence score fundamentals   Before we jump into the particular metrics and objectives we used for the project, it’s useful to discuss the fundamental attributes that constitute a good scoring model.   1. Discriminatory power   The discriminatory power of a score tells us how good the score is at separating between positive (i.e. phishy) and negative examples (i.e. safe). The chart below illustrates this idea.    For each of two models, the image shows a histogram of the model’s predicted scores on a sample of safe and phish emails, where 0 is very sure the email is safe and 1 is absolutely certain the email is phishing.    While both are generally likely to assign a higher score for a phishing email than a safe one, the example on the left shows a clearer distinction between the most likely score for a phishing vs a safe email.
 
Discriminatory power is very important in the context of phishing because it determines how well we can differentiate between phishing and safe emails, providing a meaningful ranking of flags from most to least likely to be malicious. This confidence also unlocks the ability for Tessian Defender to quarantine emails which are likely to be phishing, and reduce flagging on emails we are least confident about, improving the precision of our warnings.  
2. Calibration Calibration is another important attribute of the confidence score. A well-calibrated score will reliably reflect the probability that a sample is positive. Calibration is normally assessed using a calibration curve, which looks at the precision of unseen samples across different confidence scores (see below).
The above graph shows two example calibration curves. The gray line shows what a perfectly calibrated model would look like: the confidence score predicted for samples (x-axis) always matches the observed proportion of phishy emails (y-axis) at that score. In contrast, the poorly-calibrated red line shows a model that is underconfident for lower scores (model predicts a lower score than the observed precision) and overconfident for high scores.   From the end-user’s perspective, calibration is especially important to make the score interpretable, and especially matters if the score will be exposed to the user.
3. Consistency  A good score will also generalize well across different cuts of the samples it applies to. For example, in the context of Tessian Defender, we needed a score that would be comparable across different types of phishing. For example, we should expect the scoring to work just as well for Account Takeover (ATO) as it does for a Brand Impersonation. We also had to make sure that the score generalized well across different customers, who operate in different industries and send and receive very different types of emails. For example, a financial services firm may receive a phishing email in the form of a spoofed financial newsletter, but such an email would not appear in the inbox of someone working in the healthcare sector.
Metrics  How do we then quantify the above attributes for a good score? This is where metrics come into play – it is important to design appropriate metrics that are technically robust, yet easily understandable and translatable to a positive user experience.   A good metric for capturing the overall discriminatory power of a model is the area under the ROC curve (AUC-ROC) or the average precision of a model at different thresholds, which capture the performance of the model across all possible thresholds. Calibration can be measured with metrics that estimate the error between the predicted score and true probability, such as the Adaptive Calibration Error (ACE).    While these out-of-the-box metrics are commonly used to assess machine learning (ML) models, there are a few challenges which make it hard to use in a business context.    First, it is quite difficult to explain simply to stakeholders who are not familiar with statistics and ML. For example, the AUC-ROC score doesn’t tell most people how well they should expect a model to behave. Second, it’s difficult to translate real product requirements into AUC-ROC scores. Even for those who understand these metrics, it’s not easy to specify what increase in these scores would be required to achieve a particular outcome for the product.
Defender product-centric metrics   While we still use AUC-ROC scores within the team and compare models by this metric, the above limitations meant that we had to also design metrics that could be understood by everyone at Tessian, and directly translatable to a user’s product feature experience.    First, we defined five simpler-to-understand priority buckets that were easier to communicate with stakeholders and users (from Very Low to Very High). We aimed to be able to quarantine emails in the highest priority bucket, so we calibrated each bucket to the probability of an email being malicious. This makes each bucket intuitive to understand, and allows us to clearly translate to our users’ experience of the quarantine feature.    For the feature to be effective, we also defined a minimum number of malicious emails to prevent reaching the inbox, as a percentage of the company’s inbound email traffic. Keeping track of this metric prevents us from over-optimizing the accuracy of the Very-High bucket at the expense of capturing most of the malicious emails (recall), which would greatly limit the feature’s usefulness.   While good precision in the highest confidence bucket is important, so is accuracy on the lower end of the confidence spectrum.    A robust lower end score will allow us to stop warning on emails we are not confident in, unlocking improvements in overall precision to the Defender algorithm. Hence, we also set targets for accuracy amongst emails in the Very-Low/Low buckets.    For assurance of consistency, the success of this project also depended on achieving the above metrics across slices of data – the scores would have to be good across the different email threat types we detect, and different clients who use Tessian Defender.
Part 2.2 – Our Data: Leveraging User Feedback After identifying the metrics, we can now look at the data we used to train and benchmark our improvements to the confidence score.Having the right data is key to any ML application, and this is particularly difficult for phishing detection. Specifically, most ML applications rely on labelled datasets to learn from.    We found building a labelled dataset of phishing and non-phishing emails especially challenging for a few reasons:
Data challenges Phishing is a highly imbalanced problem. On the whole, phishing emails are extremely low in volumes compared to all other legitimate email transactions for the average user. On a daily basis, over 300 billion emails are being sent and received around the world, according to recent statistics. This means that efforts to try to label emails manually will be highly ineffective, like finding a needle in a haystack.   Also, phishing threats and techniques are constantly evolving, such that even thousands of emails labelled today would quickly become obsolete. The datasets we use to train phishing detection models must constantly be updated to reflect new types of attacks.   Email data is also very sensitive by nature. Our clients trust us to process their emails, many of which contain sensitive data, in a very secure manner.  For good reasons, this means we control who can access email data very strictly, which makes labelling harder.    All these challenges make it quite difficult to collect large amounts of labelled data to train end-to-end ML models to detect phishing.
User feedback and why it’s so useful   As you may remember from part one of this series, end-users have the ability to provide feedback about Tessian Defender warnings. We collect thousands of these user responses weekly, providing us with invaluable data about phishing.   User responses help address a number of the challenges mentioned above.    First, they provide a continually updated view of changes in the attack landscape. Unlike a static email dataset labelled at a particular point in time, user response labels can capture information about the latest phishing trends as we collect them, day-in and day-out. With each iteration of model retraining with the newest user labels, user feedback is automatically incorporated into the product. This creates a positive feedback loop, allowing the product to evolve in response to users’ needs.   Relying on end-users to label their own emails also helps alleviate concerns related to data sensitivity and security. In addition, end-users also have the most context about the particular emails they receive. Combined with explanations provided by Tessian warnings, they are more likely to provide accurate feedback.    These benefits address all the previous challenges we faced neatly, but it is not without its limitations.    For one, the difference between phishing, spam and graymail is not always clear to users, causing spam and graymail to often be labelled as malicious. Often, several recipients of the same email can also disagree on whether it is malicious. Secondly, user feedback data may not be a uniform representation of the email threat landscape – we often receive more feedback from some clients or certain types of phishing. Neglecting to address this under-representation would result in a model that performs better for some clients, something we absolutely need to avoid in order to ensure consistency in the quality of our product for all new and existing clients.   In the last part of the series Why Confidence Matters, we’ll discuss how we navigated the above challenges, delve deeper into the technical design of the research pipeline used to build the confidence-scoring model, and the impact that this has brought to our customers.
(Co-authored by Gabriel Goulet-Langlois and Cassie Quek)
Engineering Team ATO/BEC Integrated Cloud Email Security Life at Tessian
Why Confidence Matters: How We Improved Defender’s Confidence Scores to Fight Phishing Attacks
04 January 2022
‘Why Confidence Matters’ is a weekly three-part series. In this first article, we’ll explore why a reliable confidence score is important for our users. In part two, we’ll explain more about how we measured improvements in our scores using responses from our users. And finally, in part three, we’ll go over the pipeline we used to test different approaches and the resulting impact in production.   Part One: Why Confidence Matters   Across many applications of machine learning (ML), being able to quantify the uncertainty associated with the prediction of a model is almost as important as the prediction itself.    Take, for example, chatbots designed to resolve customer support queries. A bot which provides an answer when it is very uncertain about it, will likely cause confusion and dissatisfied users. In contrast, a bot that can quantify its own uncertainty, admit it doesn’t understand a question, and ask for clarification is much less likely to generate nonsense messages and cause frustration amongst its users.
The importance of quantifying uncertainty   Almost no ML model gets every prediction right every time – there’s always some uncertainty associated with a prediction. For many product features, the cost of errors can be quite high. For example, mis-labelling an important email as phishing and quarantining it could result in a customer missing a crucial invoice, or mislabelling a bank transaction as fraudulent could result in an abandoned purchase for an online merchant.      Hence, ML models that make critical decisions need to predict two key pieces of information: 1. the best answer to provide a user 2. a confidence score to quantify uncertainty about the answer. Quantifying the uncertainty associated with a prediction can help us to decide if, and what actions should be taken.
How does Tessian Defender work?   Every day, Tessian Defender checks millions of emails to prevent phishing and spear phishing attacks. In order to maximise coverage,  Defender is made up of multiple machine learning models, each contributing to the detection of a particular type of email threat (see our other posts on phishing, spear phishing, and account takeover).      Each model identifies phishing emails based on signals relevant to the specific type of attack it targets. Then, beyond this primary binary classification task, Defender also generates two key outputs for any email that is identified as potentially malicious across any of the models:   A confidence score, which is related to the probability that the email flagged is actually a phishing attack. This score is a value between 0 (most likely safe) and 1 (most certainly phishing), which is then broken down into 4 categories of Priority (from Low to Very High). This score is important for various reasons, which we further expand on in the next section. An explanation of why Defender flagged the email. This is an integral part of Tessian’s approach to Human Layer Security: we aim not only to detect phishy emails, but also to educate users in-the-moment so they can continually get better at spotting future phishing emails. In the banner, we aim to concisely explain the type of email attack, as well as why Defender thinks it is suspicious. Users who see these emails can then provide feedback about whether they think the email is indeed malicious or not. Developing explainable AI is a super interesting challenge which probably deserves its own content, so we won’t focus on it in this particular series. Watch this space!   
Why Confidence Scores Matters    Beyond Defender’s capability to warn on suspicious emails, there were several key product features we wanted to unlock for our customers that could only be done with a robust confidence score. These were: Email quarantine Based on the score, Defender first aims to quarantine the highest priority emails to prevent malicious emails from ever reaching their employees’ mailboxes. This not only reduces the risk exposure for the company from an employee still potentially interacting with a malicious email; it also removes burden and responsibility from the user to make a decision, and reduces interruption to their work.   Therefore, for malicious emails that we’re most confident about, quarantining is extremely useful. In order for quarantine to work effectively, we must:   Identify malicious emails with very high precision (i.e. very few false positives). We understand the reliance of our customers on emails to conduct their business, and so we needed to make sure that any important communications must still come through to their inboxes unimpeded. This was very important so that Tessian’s Defender can secure the human layer without security getting in our user’s way. Identify a large enough subset of high confidence emails to quarantine. It would be easy to achieve a very high precision by quarantining very few emails with a very high score (a low recall), but this would greatly limit the impact of quarantine on how many threats we can prevent. In order to be a useful tool, Defender would need to quarantine a sizable volume of malicious emails.   Both these objectives directly depend on the quality of the confidence score. A good score would allow for a large proportion of flags to be quarantined with high precision.
Prioritizing phishy emails In today’s threat landscape, suspicious emails come into inboxes in large volumes, with varying levels of importance. That means it’s critical to provide security admins who review these flagged emails with a meaningful way to order and prioritize the ones that they need to act upon. A good score will provide a useful ranking of these emails, from most to least likely to be malicious, ensuring that an admin’s limited time is focused on mitigating the most likely threats, while having the assurance that Defender continues to warn and educate users on other emails that contain suspicious elements.   The bottom line: Being able to prioritize emails makes Defender a much more intelligent tool that is effective at improving workflows and saving our customers time, by drawing their attention to where it is most needed.  
Removing false positives We want to make sure that all warnings Tessian Defender shows employees are relevant and help prevent real attacks.    False positives occur when Defender warns on a safe email. If this happens too often, warnings could become a distraction, which could have a big impact on productivity for both security admins and email users. Beyond a certain point, a high false positive rate could mean that warnings lose their effectiveness altogether, as users may ignore it completely. Being aware of these risks, we take extra care to minimize the number of false positives flagged by Defender.    Similarly to quarantine, a good confidence score can be used to filter out false positives without impacting the number of malicious emails detected. For example, emails with a confidence score below a given threshold could be removed to avoid showing employees unnecessary warnings.
What’s next?   Overall, you can see there were plenty of important use cases for improving Tessian Defender’s confidence score. The next thing we had to do was to look at how we could measure any improvements to the score. You can find a link to part two in the series below (Co-authored by Gabriel Goulet-Langlois and Cassie Quek)
Life at Tessian
Tessian’s 2021 Was Action Packed, Here’s What We Got Up To…
By Andrew Webb
20 December 2021
Well 2021 was certainly a year to remember! Here’s just some of the things we’ve achieved in the last 12 months…. Tessian in numbers Scanned nearly 5 billion emails  Identified over half a million malicious emails Stopped close to 30,000 account takeover attempts Prevented over 100,000 data breaches due to a misdirected email We donated $13,220 donated to charities chosen by our customers during the winter holidays Promoted 39 people internally Hired 155 new employees with the highest proportion going to engineering and sales (we’re still hiring!)  Expanded our senior team to include roles such as CISO, Head of Threat Intelligence, Trust & Compliance Lead, Chief Product Officer, and Chief Strategy Officer Announced five new partner integrations including Okta, KnowBe4 and Sumo Logic. Secured 995 pieces of news coverage in both mainstream and trade media Welcomed nearly 6,000 attendees to our three virtual Human Layer Security Summits Hosted and sponsored 104 virtual and physical events globally
January We kicked off January with our How to Hack a Human research report, and followed with our new mission video. On the product side, Tessian Defender began protecting against External Account Takeover.
February  Tessian Guardian continued to evolve as February saw us launch our ‘Misattached Files’ feature, which uses machine learning to automatically detect and prevent people accidentally sharing the wrong files via email. Which, according to our research 48% of employees have done….   March To celebrate International Women’s March Day, we launched our second installment of the Opportunity in Cybersecurity report, highlighting how nearly half of women working in cybersecurity (49%) say that the COVID-19 pandemic has affected their career in a positive way. On the people side we welcomed Matt Smith as Chief Strategy Officer  And Tessian Guardian continued to add new features with even more customization settings to fine tune it to your organization’s specific requirements. Finally, we launched our springtime Human Layer Security Summit. 
April April saw the launch of our Diversity and Inclusion strategy, with our long term aim of growing and expanding the entry-level talent pool by creating junior jobs for people entering the tech industry, whether that’s in Sales or Engineering.  On the product side, we also launched our Human Layer Risk Hub.    May We hit the jackpot in May when, after much hard work, we raised $74m raised in Series C plus extension funding. To announce the move we took over the famous billboard at Time Square. We also welcomed Sumo Logic CEO Ramin Sayar to Tessian’s Board of Directors. And knowing how important rest and time away from work is to our staff, we launched Refressian Summer, giving every employee Friday afternoon off during July and August.    June June saw no signs of slowing down as we hosted our summer Human Layer Security Summit, added Human Layer Security Intelligence to our platform to help give you more visibility and insight into your human layer risks. And, as the world came out of various lockdown programs, we launched our Back to Work Report. 
July A highlight of July was our Summer social event, where staff could let their hair down and party (see below). We also (re)opened new & existing offices in London UK, Boston MA, and Austin TX.  We were named Representative Vendor in the 2021 Gartner Market Guide for Data Loss Prevention. And we were recognized as one of the top three medium-sized companies in the UK’s Best Workplaces™ for Women.   August August saw us set up shop at BlackHat USA 2021, and Hire Josh Yavor as our Chief Information Security Officer.    September After a relaxing summer, it was ‘back to school’ in September, when we launched our Spear Phishing Threat Landscape 2021 report. Over a 12-month period, Tessian detected nearly two million malicious emails that slipped past legacy phishing solutions.  We also hired our 200th Tessian, and were voted Best Place to Work in Tech UK. We also held our first internal TES Talk – where once a month anyone in the company can give a short talk about a passion project, subject or something they’ve worked on. 
October As the Fall rolled around October saw us launch Architect; a powerful policy engine for real-time email data loss prevention. Gartner recognized Tessian as a Representative Vendor in the 2021 Market Guide for Email Security.  And we were voted Rover’s best dog-friendly companies 2021 🐾. We announced our integration with integrations with Okta to help organizations protect against the biggest threats to enterprise security – people’s identities and behaviors. The end of October also saw Central London reverberate to ghostly screams and wails as we hosted our Halloween karaoke social night… 
November The penultimate month of the year saw our final HLS Summit of 2021. We also recognized how hard and stressful being a CISO can be in our CISO Lost Hours report.    More people joined us including Allen Lieberrman joins as Chief Product Officer. A commissioned study conducted by Forrester Consulting on behalf of Tessian shows that Security and Risk leaders feel little control over risks posed by employees, which you can read here. And the silverware kept coming as Fast Company named us one of best innovators in AI and Data, and Deloitte recognized our epic growth in their Fast 50 for 2021 list. The product team were kept busy with our integrations with Sumo Logic. December After an exciting year, it was once again time for a party, with those based in London meeting up in person for drinks and games, while others attended our online virtual event. Another month, another integration as we paired up with our good friends at KnowBe4    We reached another milestone when our podcast, RE: The Human Layer, reached 5000 downloads. And we launched humanlayersecurity.com, our new online magazine for security leaders. Finally, our marketing team met up in person in Austin, TX to plan out how we’re going to top what was a challenging but epic year for Tessian! So, as we come to the end of 2021, we’d just like to say thank you to those of you who’ve been on this amazing journey with us, and as Frank Sintra once sang, the best is yet to come. See you in 2022… Merry Christmas and a Happy New Year!
Life at Tessian
Tessian Named One of ‘Next Big Things in AI and Data’ by Fast Company
By Laura Brooks
18 November 2021
We’ve been recognized in Fast Company’s inaugural Next Big Things in AI and Data list   The list honors technology breakthroughs that promise to shape the future of their industries, and includes global giants, intrepid startups, and research that is fresh from the labs.    In all, our approach to Human Layer Security joins 64 other technologies, products, and services that will have a positive impact for consumers, businesses, and society at large in the next five years.    If you’ve read this blog or any of our reports, you’ll know our approach to cybersecurity is designed to protect people, not just machines and data.    Why, because 95% of today’s data breaches are caused by human error. Using machine learning to understand people’s communication patterns and behaviors online, Tessian automatically stops data breaches caused by employees on email and continuously drives people towards safer email behavior, thanks to in-the-moment training.    “It just takes one mistake, one carefully crafted phishing email, or one moment when an employee lets their guard down for company security to be compromised,” says Tim Sadler, CEO and co-founder of Tessian. “Those ‘Oh Sh*t!’ security moments cost people their jobs and businesses their reputations – but they can be stopped. Our technology empowers employees to make safe cybersecurity decisions in-the-moment and prevents mistakes before they turn into breaches. In today’s threat landscape, this people-first approach to security has never been more important and I’m so proud to be recognized by Fast Company for our work.”    “Fast Company is thrilled to highlight cutting-edge technologies that are solving real-world problems in unexpected ways. From climate change and public health crises to machine learning and security, these technologies will certainly have a profound impact on the future, and we’re honored to bring attention to them today,” says Stephanie Mehta, editor-in-chief of Fast Company.   You can see the full list here
Integrated Cloud Email Security Life at Tessian
Tessian Announces Allen Lieberman as its Chief Product Officer
By Tessian
01 November 2021
We are very pleased to welcome Allen Lieberman as Tessian’s new Chief Product Officer who will head up the continued development of the industry’s first and leading Intelligent Cloud Email Security platform. Allen joins us from VMware Carbon Black, where he worked for nearly 9 years, and held roles including Senior Director of Product Marketing and VP of Product Management. He has spent the vast majority of the last 20 years in the Software-as-a-Service space. We took a few minutes to get to know Allen and find out what he’s looking forward to in his new role.    Allen, hi! Let’s start off with an easy question: why did you decide to join Tessian?  A combination of reasons, really.  First, the mission. Tessian is set out on a compelling mission that is critical to customers’ ability to scale and defend their enterprise in the modern threat and communications landscape. People can – and should – be a security team’s best asset. By enabling the employee community to help protect and defend the enterprise, security teams are better positioned to scale and protect their organizations. Until now, securing the human layer has been underserved. But as the enterprise and communications landscape evolves, putting people first is critical to the success of modern security programs. Tessian has set out on a mission to make this a reality.   Second, the culture and team at Tessian is world class. Having been in the trenches with key members of the team, I understand the culture that is being cultivated and feel good about the high level of diverse talent we have. At Tessian, there is a focus on doing the right thing, staying positive, persevering through challenges, and keeping people at the center of what we do. Having the culture aligned to my core values was critical in my decision.  And third, the time is right. Security teams, today, are dealing with unprecedented levels of cybercrime. As organizations have become more distributed and cloud-first, as employees communicate over emerging channels and as attackers evolve to meet employees where they are, now is the time for a better solution to help enable every employee to protect the enterprise.   It’s rare to find a company that has all these three things.    What do you see as the top benefit Tessian offers to customers?  The sea change that Tessian enables is turning the employee base into a security team’s best asset, while reducing overhead on the security teams.  Tessian automates the protection of critical communications channels like email while assisting people in understanding their role of protecting the enterprise – which is unlike so many other security solutions. The ability to embed security communication and training ‘in-the-moment’, when an employee needs it most, helps build a collaborative culture between staff and security teams while reducing breach responses. It’s great when employees really feel that security teams ‘have their back’ and that’s what Tessian enables.    What do you see as the biggest opportunity for Tessian?  Our biggest opportunity is to shift our customer’s mindset from security being seen as something that security teams do, to security being something that all employees do.  When we accomplish that – i.e. when employees become part of the new perimeter and when all employees are truly extended parts of security teams – we would’ve changed the security game. I think that’s the biggest opportunity we have.    What’s your focus for the next 3-6 months?  I’ll be very much focused on learning over the next few months. While I’m coming into Tessian with many years of experience, there is so much to take in, as with I think about prioritizing and executing on the opportunity to drive change ahead.  My intent is to learn from our team, from our customers and from our partners. I’m excited to understand more about the challenges that are faced by our customers, the opportunities we have to address them and, of course, I’m interested in learning much more about our team.     And finally, can you summarize Tessian’s mission in 25 words or less? sure, Tessian Cloud Email Security intelligently prevents advanced email threats and protects against data loss, to strengthen email security and build smarter security cultures in modern enterprises.
Engineering Team
A Solution to HTTP 502 Errors with AWS ALB
By Samson Danziger
01 October 2021
At Tessian, we have many applications that interact with each other using REST APIs. We noticed in the logs that at random times, uncorrelated with traffic, and seemingly unrelated to any code we had actually written, we were getting a lot of HTTP 502 “Bad Gateway” errors.   Now that the issue is fixed, I wanted to explain what this error means, how you get it and how to solve it. My hope is that if you’re having to solve this same issue, this article will explain why and what to do.    First, let’s talk about load balancing
In a development system, you usually run one instance of a server and you communicate directly with it. You send HTTP requests to it, it returns responses, everything is golden.    For a production system running at any non-trivial scale, this doesn’t work. Why? Because the amount of traffic going to the server is much greater, and you need it to not fall over even if there are tens of thousands of users.    Typically, servers have a maximum number of connections they can support. If it goes over this number, new people can’t connect, and you have to wait until a new connection is freed up. In the old days, the solution might have been to have a bigger machine, with more resources, and more available connections.   Now we use a load balancer to manage connections from the client to multiple instances of the server. The load balancer sits in the middle and routes client requests to any available server that can handle them in a pool.    If one server goes down, traffic is automatically routed to one of the others in the pool. If a new server is added, traffic is automatically routed to that, too. This all happens to reduce load on the others.
What are 502 errors? On the web, there are a variety of HTTP status codes that are sent in response to requests to let the user know what happened. Some might be pretty familiar:   200 OK – Everything is fine. 301 Moved Permanently – I don’t have what you’re looking for, try here instead.  403 Forbidden – I understand what you’re looking for, but you’re not allowed here. 404 Not Found – I can’t find whatever you’re looking for. 503 Service Unavailable – I can’t handle the request right now, probably too busy. 4xx and 5xx both deal with errors. 4xx are for client errors, where the user has done something wrong. 5xx, on the other hand, are server errors, where something is wrong on the server and it’s not your fault.    All of these are specified by a standard called RFC7231. For 502 it says:   The 502 (Bad Gateway) status code indicates that the server, while acting as a gateway or proxy, received an invalid response from an inbound server it accessed while attempting to fulfill the request.   The load balancer sits in the middle, between the client and the actual service you want to talk to. Usually it acts as a dutiful messenger passing requests and responses back and forth. But, if the service returns an invalid or malformed response, instead of returning that nonsensical information to the client, it sends back a 502 error instead.   This lets the client know that the response the load balancer received was invalid.
The actual issue   Adam Crowder has done a full analysis of this problem by tracking it all the way down to TCP packet capture to assess what’s going wrong. That’s a bit out of scope for this post, but here’s a brief summary of what’s happening:    At Tessian, we have lots of interconnected services. Some of them have Application Load Balancers (ALBs) managing the connections to them.   In order to make an HTTP request, we must open a TCP socket from the client to the server. Opening a socket involves performing a three-way handshake with the server before either side can send any data.   Once we’ve finished sending data, the socket is closed with a 4 step process. These 3 and 4 step processes can be a large overhead when not much actual data is sent.  Instead of opening and then closing one socket per HTTP request, we can keep a socket open for longer and reuse it for multiple HTTP requests. This is called HTTP Keep-Alive. Either the client or the server can then initiate a close of the socket with a FIN segment (either for fun or due to timeout).
The 502 Bad Gateway error is caused when the ALB sends a request to a service at the same time that the service closes the connection by sending the FIN segment to the ALB socket. The ALB socket receives FIN, acknowledges, and starts a new handshake procedure.   Meanwhile, the socket on the service side has just received a data request referencing the previous (now closed) connection. Because it can’t handle it, it sends an RST segment back to the ALB, and then the ALB returns a 502 to the user.   The diagram and table below show what happens between sockets of the ALB and the Server.
How to fix 502 errors   It’s fairly simple. Just make sure that the service doesn’t send the FIN segment before the ALB sends a FIN segment to the service. In other words, make sure the service doesn’t close the HTTP Keep-Alive connection before the ALB.    The default timeout for the AWS Application Load Balancer is 60 seconds, so we changed the service timeouts to 65 seconds. Barring two hiccups shortly after deploying, this has totally fixed it.   The actual configuration change   I have included the configuration for common Python and Node server frameworks below. If you are using any of those, you can just copy and paste. If not, these should at least point you in the right direction.  uWSGI (Python) As a config file: # app.ini [uwsgi] ... harakiri = 65 add-header = Connection: Keep-Alive http-keepalive = 1 ... Or as command line arguments: --add-header "Connection: Keep-Alive" --http-keepalive --harakiri 65 Gunicorn (Python) As command line arguments: --keep-alive 65 Express (Node) In Express, specify the time in milliseconds on the server object. const express = require('express'); const app = express(); const server = app.listen(80); server.keepAliveTimeout = 65000
Looking for more tips from engineers and other cybersecurity news? Keep up with our blog and follow us on LinkedIn.
Page