I always feel bad when I try out a new coding problem for interviews because I feel I’m going to offend candidates with such an easy problem (I interview mostly for senior positions). And I’m always shocked by how few are able to solve them. The current problem I use requires splitting a text into words as a first step. I show them the text, it’s the entire text of a book, not just some simple sentence. I don’t think I’ve had a single candidate do that correctly yet (most just split by a single space character even though they’ve seen it’s a whole book with newlines, punctuation, quotes, parentheses, etc).
I am curious how you’d deal with the ambiguity of contractions vs. ending single quotes. I guess that character between letters can be assumed to be part of the word, but not if it’s between a letter and a space, for example. If you ignore contractions, hyphenated words, and accented characters, you could just match on /[a-zA-Z]+/.
That is totally a non-trivial problem, which requires a lot more conception before it can be solved. Even for English, this is not well defined: Does “don’t” consist of one or two words? Should “www.google.com” be split into three parts? Etc.
And don’t let me start with other languages:
In French, “qu’est-ce que” is one word (what).
In the German sentence “Ruf mich an.”, the “Ruf an” is one word (call) while mich is another word (me).
In Chinese, you usually don’t even have spaces between words.
If I got that feature request in a ticket, I’d send it back to conception. If you asked me this question in an interview, I’d ask if you wanted a programmer, a requirements analysis, or a linguist and why you invite people for a job interview if you don’t even know what role you are hiring for.
That is totally a non-trivial problem, which requires a lot more conception before it can be solved.
Most candidates don’t realize that. And when I say they split by single space I mean split(' '). Not even split(/\s+/).
Does “don’t” consist of one or two words? Should “www.google.com” be split into three parts? Etc.
Yes, asking those questions is definitely what you should be doing when tackling a problem like this.
If I got that feature request in a ticket, I’d send it back to conception.
If I got it, I’d work together with the product team to figure out what we want and what’s best for the users.
If you asked me this question in an interview, I’d ask if you wanted a programmer, a requirements analysis, or a linguist and why you invite people for a job interview if you don’t even know what role you are hiring for.
That would be useful too. Personality, attitude, and ability to work with others in a team are also factors we look at, so your answer would tell me to look elsewhere.
But to answer that question, I’m definitely not looking for someone who just executes on very clear requirements, that’s a junior dev. It’s what you do when faced with ambiguity that matters. I don’t need the human chatGPT.
Also, I’m not looking for someone perfectly solving that problem, because it doesn’t even have a single clear solution. It’s the process of arriving to a solution that matters. What questions do you ask? Which edge cases did you consider and which ones did you miss? How do you iterate on your solution and debug issues you run into on the way? And so on
There is no single definition, for what a word is, which is exactly my point. Some linguists even argue that “word” is inherently undefinable and refuse to use it as a category.
One common (but still ambiguous) definition is though, that a word is the smallest unit in a language that can stand on its own and conveys a meaning. By that definition, “Ruf … an” is one word, as “an” is not a word by itself. It might not be too obvious, as “an” can also be a word by itself , just not in this context.
Another example, where it’s more obvious, is “innehalten”. “Inne” is not a word, it has no meaning by itself, it cannot be used on its own, so in the sentence “halte kurz inne”, “halte inne” is one word. Another example would be “Stelle etwas dar”, where “dar” is obviously not a word by itself.
Fun fact: Verb literally means word in Latin, so saying they are the part of the same verb, but not the same word is kind of an oximoron.
I always feel bad when I try out a new coding problem for interviews because I feel I’m going to offend candidates with such an easy problem (I interview mostly for senior positions). And I’m always shocked by how few are able to solve them. The current problem I use requires splitting a text into words as a first step. I show them the text, it’s the entire text of a book, not just some simple sentence. I don’t think I’ve had a single candidate do that correctly yet (most just split by a single space character even though they’ve seen it’s a whole book with newlines, punctuation, quotes, parentheses, etc).
I am curious how you’d deal with the ambiguity of contractions vs. ending single quotes. I guess that character between letters can be assumed to be part of the word, but not if it’s between a letter and a space, for example. If you ignore contractions, hyphenated words, and accented characters, you could just match on
/[a-zA-Z]+/
.That’s the thing, nobody even asks this question.
That would already put you in the top 10% of solutions I’ve seen so far on this problem.
My confidence in my job security and general programming abilities has skyrocketed after visiting this thread
That is totally a non-trivial problem, which requires a lot more conception before it can be solved. Even for English, this is not well defined: Does “don’t” consist of one or two words? Should “www.google.com” be split into three parts? Etc.
And don’t let me start with other languages: In French, “qu’est-ce que” is one word (what). In the German sentence “Ruf mich an.”, the “Ruf an” is one word (call) while mich is another word (me). In Chinese, you usually don’t even have spaces between words.
If I got that feature request in a ticket, I’d send it back to conception. If you asked me this question in an interview, I’d ask if you wanted a programmer, a requirements analysis, or a linguist and why you invite people for a job interview if you don’t even know what role you are hiring for.
Most candidates don’t realize that. And when I say they split by single space I mean
split(' ')
. Not evensplit(/\s+/)
.Yes, asking those questions is definitely what you should be doing when tackling a problem like this.
If I got it, I’d work together with the product team to figure out what we want and what’s best for the users.
That would be useful too. Personality, attitude, and ability to work with others in a team are also factors we look at, so your answer would tell me to look elsewhere.
But to answer that question, I’m definitely not looking for someone who just executes on very clear requirements, that’s a junior dev. It’s what you do when faced with ambiguity that matters. I don’t need the human chatGPT.
Also, I’m not looking for someone perfectly solving that problem, because it doesn’t even have a single clear solution. It’s the process of arriving to a solution that matters. What questions do you ask? Which edge cases did you consider and which ones did you miss? How do you iterate on your solution and debug issues you run into on the way? And so on
They’re both parts of the verb anrufen but I’ve never heard someone say they’re still a single word when there’s a space (or more) inbetween
There is no single definition, for what a word is, which is exactly my point. Some linguists even argue that “word” is inherently undefinable and refuse to use it as a category.
One common (but still ambiguous) definition is though, that a word is the smallest unit in a language that can stand on its own and conveys a meaning. By that definition, “Ruf … an” is one word, as “an” is not a word by itself. It might not be too obvious, as “an” can also be a word by itself , just not in this context. Another example, where it’s more obvious, is “innehalten”. “Inne” is not a word, it has no meaning by itself, it cannot be used on its own, so in the sentence “halte kurz inne”, “halte inne” is one word. Another example would be “Stelle etwas dar”, where “dar” is obviously not a word by itself.
Fun fact: Verb literally means word in Latin, so saying they are the part of the same verb, but not the same word is kind of an oximoron.