Philosophical Thoughts

Hiring is inherently an extremely noisy prediction problem. Therefore there are two main things every interview process has to define:

  • What are the signals that correlate to future job performance?
  • How do I implement noise reduction in my process?

There are many thoughts out there that attempt to answer the former. However, the hiring process (and the signals an organization should look for) are inherently subjective to the organization and how job performance is evaluated. Ultimately, any hiring manager should develop their interview process based on how they would actually evaluate the candidate if hired. Consider how your current employees would do on your interview process, and the correlation of that to how you currently rank their performance. If your best employees (based on your performance assessment) would not also perform the best on your interview process, then you have a problem with one of these things!

Let’s discuss some examples of commonly-used-but-negative signals:

  • Github profiles: A large portfolio on Github/Gitlab/etc… can be evidence of one’s coding ability and their overall work. However, this is an incredibly noisy signal. Growth since code was written, lesser standards for personal projects, forks, etc… make it very difficult to evaluate how one will perform as a software professional at this point in time. Github profiles with lots of stars, or repositories that are tied to a project mentioned on their resume (e.g. major OSS contribution) can provide additional insight, but the absence of it does not tank a candidate’s profile. Therefore it is a supporting characteristic at best, but should not be a part of the core evaluation of a candidate.

  • Any questions that take more a couple of minutes to answer: As a hiring group or manager, you have an incredibly small amount of time with a candidate to determine their performance. Unless you feel that a long question has been uncovered as “THE” signal for job performance (and if you really have done that, then I’d also like your lottery numbers and stock predictions) then you should avoid long, detailed questions. I’ve been guilty of this myself, as I’ve thought that diving deeper into a single long question would provide better insight on a candidate’s depth of knowledge, but in reality unless I’m hiring a candidate for that one specific thing, I’ve missed my opportunity to test them on how they will perform on everything else. In all likelihood, your signals are not very good on their own, so your best chance at engineering useful features in your interview process is to get samples for as many (good) signals as you can in your 30 or 45-minute interview. To do this, you must ask shorter questions. Every second you as the interviewer are talking is an opportunity cost to gaining better information on your candidate.

  • Live coding interviews: This is a big shift for me personally. I previously bought into the idea of the coding interview as a meritocratic evaluation of writing software. I’ve never been convinced it was the optional way, but thought that it was the best method we as an industry had. I’ve been convinced otherwise recently. From my own experience, coding interviews are a skill in and of themselves, and this is evident by the amount of resources out there that teach how to “git gud” at these, e.g. “Cracking the Coding Interview”. This privileges certain candidates (often more junior who are fresh out of CS education) and writing an algorithm or solving a function on-the-fly does not actually point to how well they write understandable, testable, and maintainable code. Returning to the concept that our interview signals should correlate to job performance criteria - how much weight do you put on our existing employee’s ability to write code on a whiteboard or a screen share while others watch? I’ve also been recently introduced to the idea that coding interviews are another ingress point for cultural bias as code can often be opinionated. Are you evaluating how someone writes code? Or are you evaluating how well they write code that looks like ours? I don’t think this is a conscious bias, but nonetheless I’m too conscious of how often I look back on my own code days/weeks/months later and think “Who the heck wrote this and why is it so bad?” to think that how I code should be considered an objective measure.

  • Experience with hyper-specific technologies in your stack: Experience is something that can’t be ignored. However, a lot of how this is evaluated is essentially a boolean search of a resume. Recruiters, screeners, and hiring managers are all guilty of this. If you’re looking for someone to build data visualizations, and your organization uses Tableau, are evaluating resumes for data visualization skills/experience or for Tableau expertise? Ultimately, if a candidate has produced beautiful and informative visualizations with Looker, D3.js, or PowerBI, does it matter if they don’t list Tableau on their resume? Unless you are in the slowest-moving organization on planet earth, I guarantee your existing employees have had to learn new things that have some relation to what they already know, and I’d also go out on a limb and say that you are likely scoring employees who can quickly learn and adapt to new technologies positively. I certainly understand the concern around ensuring that a candidate could become productive quickly, and the learning process inserts some friction in to that, but I think situations like the Looker/Tableau example above actually give you an opportunity to mine two signals in one question. Personally, I would ask something like “I see you have great experience with x and y tools. If I asked you to learn z tool, how would you do it?” Ideally, a candidate could discuss their experience in the overall space, and then discuss how they would leverage that experience to become productive with a new tool.

  • Previous work at well-known companies: In all honesty, I get excited when I see a resume with FAANG or FAANG-adjacent experience. I’m trying to work on correcting that bias, but its understandable that folks may consider this a generally positive signal based on the reputation of these companies. As anyone who has worked in a larger enterprise can attest to, there can be a massive variance in the technical skills of different teams. This phenomenon occurs even in high-reputation companies like FAANG. This is not to say that there aren’t amazing candidates with experience at those companies, but that the company name itself is not a signal. Not only that, but if we as hiring managers treat it as a signal we are transitively subjecting ourselves to those companies’ own hiring biases and bad signals. At the end of the day, I’d suggest ignoring company names on the resume and focusing on experience and responsibility.

What are some good signals? Well…

  • More short questions: As discussed above, question length is an opportunity cost to evaluating the breadth of skills the position requires. As an example, if it takes 20 minutes to do an intricate scenario question about a specific model, you likely won’t have time to ask questions about their SQL knowledge, business acumen, etc… It also allows you to diversify your signals. If Python is primary skill needed in the role, then 2-3 shorter questions that touch on OOP, Performance Tuning, and Testing likely gives you a better signal about their overall Python knowledge than one long question.

  • Code Reading: This takes some pre-work to find good examples, but I think giving bits of code (say a couple of related functions or a class) and ask them to read the code, explain what the code is doing, and if they have any design improvements. Compared to live coding questions, code reading often is quicker, which allows for more questions; and more representative of what a technical professional does on a day-to-day basis. Recall Robert C. Martin’s words in Clean Code: “…time spent reading versus writing is well over 10 to 1. We are constantly reading old code as part of the effort to write new code” I’m sure we’ve all heard of the trope of the “10X” engineer. Consider that if we read 10x as much code as we write, then selecting candidates who can quick grasp a codebase actually may be able to be 10X as productive.

  • Systems Design Questions: Systems thinking is a skill that bridges all technical disciplines. It’s extremely easy to find someone who can accomplish a technical task in a vacuum. It’s harder to find someone who can accomplish a technical task while considering testability, architecture, scalability and maintainability. Given the same task, the solutions that these two hypothetical engineer/scientists come up with would look wildly different. Asking questions like “You need to move data from an on-premise server to the cloud. New data is produced every hour. How do you build this?” will likely give good insights in how a candidate decomposes a technical problem and their selection process for the tools they choose. These types of questions are intended to be language- and tool-agnostic, which ideally allows one to understand the candidate’s approach rather than their knowledge on any esoteric tooling.

  • Experience in a business context: In any business, the point is to turn a profit. Everything you do as a Data Science or Software team is intended to drive either profit generation or expense reduction. Thus, there should be some business context to everything we are doing (if you can’t identify what that is, you need to be reading a very different blog). Likewise, there should be a business context to the work that our candidates have done. More importantly, candidates should be able to communicate the business context of their accomplishments, both on their resume and in their interview questions. Unless you work in an organization with a culture that rewards pure engineering wins, there is likely not a recognition/promotion path for “Optimized an algorithm to reduce run time from 10ms to 5ms”. However, “Optimized an algorithm that allowed us to serve 50% more customer requests” is much more likely to resonate with a company’s leadership. Therefore, I tend to look for candidates who can place their work in a business context as a signal to their understanding of technology beyond technology’s sake.

Noise Reduction

Now that we’re aware of what signals to avoid and which to emphasize, let’s discuss noise reduction in the interview process. We can approach this similarly to feature engineering in modeling. Consider that each n interview produces m signals. We can construct a matrix of m x n to represent the signals from the total interview process. From this we can begin to apply concepts from Independent Component Analysis to interviewing. ICA is a computational method to separating independent signals from mixed information. The linked paper provides the statistical explanation of ICA, but there are some core principles of ICA that we can leverage to improve the interview process.

Inspired by information theory, one of the main approaches for ICA estimation is minimization of mutual information. We use panel interviews, but I observe that often the same signals are uncovered by multiple interviewers. A debrief session on potential candidates results in 3-4 people agreeing on the same opinions of a candidate. If we consider each of our panelist’s interview questions as m in the previously defined matrix, it is necessary to audit the mutual information of each interview question. In the case of our status quo interviews, mutual information is high. Although no information is shared between panelists until the end of the interview process, that lack of information sharing is what creates mutual overlap.

So how do we address the mutual information issue? The solution is two-fold. First is restructuring how the panel interviews occur. I suggest considering each of the panelists as a meta-signal, in that their interviews should focus on a single overall component that you are intending to evaluate. In practice, this means having each of the interviewers focus on a specific topic. Second, a structured interview is a must. Note that this does not imply that there is an objective set of interview topics or questions that work for every role. As described previously, one should audit their performance evaluation and map that to interview topics.

As an example, for my Data Engineering team, I set performance goals that encourage the following:

  • Technical Excellence
  • Deliver timely, high-quality work.
  • Insurance/Business Acumen
  • Professional Development
  • Potential

Each of these can be difficult to evaluate, but the goal is to define an interview structure that allows for sufficient time to dive into the specific aspects of the role, while also minimizing redundant questions.

For example, consider the current interview process for a Data Engineering Role in my current organization:

  • Recruiter Screen (30 minutes)
  • Hiring Manager Screen (45 minutes)
  • Panelist 1 (30 minutes)
  • Panelist 2 (30 minutes)
  • Panelist 3 (30 minutes)

This results in spending 2 hours and 45 minutes with a candidate. However, in an unstructured process, each panelist is likely asking 1-2 questions about technical capability, 1-2 questions about how they handle challenging tasks, and perhaps 1-2 questions about interpersonal communication. What this ultimately translates to is functionally the same 30 minute interview 4 different times. All this does is introduce more noise!

So how we do we fix this? Well, let’s establish a framework for structured Data Science/Engineering interviews:

1. Delegate specific responsibilities to your panel and develop a standard set of questions

Rather than the status quo approach outlined above, I suggest a new framework is that will delegate evaluation of specific categories to each member of the panel. This will require you to deeply trust the people who you select for your panels, and it will also require some preparation on both the panelist’s and the hiring manager’s part. For example:

  • Recruiter Screen: 1-2 core questions to serve as a filter
  • Hiring Manager Screen: Team Fit
  • Panelist 1: Technical Skills
  • Panelist 2: Business Scenarios
  • Panelist 3: Career Potential/Leadership

I would argue this results in a better interview experience for both the candidate and the interviewer. In the current “everyone-interviews-everything” approach, both the interviewer and the candidate are having to context shift every few minutes, and there is often not enough to get in-depth on any particular area when there is so much ground to cover. I’m not going to dive deep on any of the specific questions that should be asked, but that does lead me into the second point of our framework. I should also note that the particular focus areas you choose are subjective to your organization and the role you are hiring for; and the above areas are simply examples.

2. Collect Data on your Questions

Interview questions are an entire discussion in and of themselves. This article has previously outlined some philosophical thoughts on what sort of questions to ask, which can be used as a starting point for a hiring manager to create their own questions. Ultimately, hiring is a game of matching signals to outcomes. How can we “feature engineer” our questions if we don’t keep good data on them? I would suggest each interviewer maintain their own page of questions that focus on their particular area. These questions should be asked consistently to all candidates, and an interviewer should score their responses (at least briefly) in the interview as the questions are being asked to ensure the most robust feedback possible. That being said, A/B testing your interview questions, if done systematically, can help to improve your interview outcomes. This is something companies like Google do quite a bit of. Each interviewer should be able to provide a written record of the questions asked in the interview and how they scored that particular candidate.

3. Use your data to make decisions

Ultimately, the data you collect is useless if you ignore it. I’d suggest evaluating candidates horizontally across a single question area rather than at a high-level. This is intended to help calibrate our expectations but also to help provide some internal consistency, especially if you are placing a higher weight on some questions compared to others. You should also evaluate the aggregated data on a regular basis. Especially if a candidate has been hired, correlate your interview data to their performance after 6 months in the role. This is an opportunity to modify your interview questions, or even change up your panel - it’s good to keep interviews somewhat fresh, but ultimately you should be optimizing toward a correlative maximum. This review should also evaluate your own internal consistency - if one is placing emphasis on certain skills, and yet continually hires candidates who perform weakly on evaluation of those skills, then the hiring manager either needs to recalibrate their process or what they are hiring for.

Final Thoughts

Interviewing is challenging. I know that’s a crazy statement, but even the best of companies struggle to hire effectively. I suggest that treating it as a noisy prediction problem allows one to determine some pathways to reduce the noise in our interview process, and hopefully make it more enjoyable and efficient for the both the interviewer and the candidate. The top candidates will judge your organization on just about everything, including your hiring process, and improving said process is ultimately a way to improve your positioning within the talent market.