Data Science Manager Weekly #3
Our current set up is not ideal. Data might be able to help.
Welcome folks to Data Science Manager Weekly. It’s a weekly dose of curated articles to help you get better at being a data science manager.
For this week, we’re looking at how COVID-19 is breaking out models, machine learning technical debt, data augmentation in NLP, building diverse teams and using data to build a remote organization.
If you received this from a friend and want to receive this every week, don’t forget to subscribe below:
Data Science in the Real World
Our weird behavior during the pandemic is messing with AI models
Our models have a key assumption that the data they will be fed will be similar in distribution as the train set. Given that COVID-19 is changing people’s behaviors. It makes sense to keep an eye on our models if they’re still performing as expected. Read more
Creating a Multisegment Scrollytelling Data Story From Scratch: Key Moments and Lessons
An interesting look behind the scenes of a data story. Although it’s Javascript instead of the usual Python and R, it’s interesting to look at the considerations you have to take note of. Read more
Nitpicking Machine Learning Technical Debt
Really comprehensive post talking about technical debt in machine learning and best practices to avoid them. While it talks about a lot of problems that we face in the machine learning context, one of the biggest takeaways for me is that ML engineering is a relatively young field and that whatever frameworks we have now are pretty young and probably won’t last long. Read more
Powered by AI: Advancing product understanding and building new shopping experiences
Facebook has pushed some new improvements to better tag items in their Marketplace for both the items in the future and the attributes of the product for better searchability. Read more
Training a Neural Network Can Emit More Than 600,000 Pounds of CO2. But Not for Long
It’s not surprising that neural networks have a large carbon footprint given how compute intensive it is. What surprised me was that it’s “almost five times the amount of carbon dioxide emitted by the average car during its lifetime”. Great to see that there’s efforts in this space to make neural networks more environmentally friendly. Read more
Monitoring Data Quality at Scale with Statistical Modeling
Once you reach a certain scale, it becomes hard to keep up with all the possible ways the data could be wrong. Using anomaly detection to deal with data quality issues seems like a more scalable way to deal with the problem. Read more
Data Science Tools and Techniques
DeepFaceLab
Deepfake software that’s accessible and doesn’t require knowledge of deep learning. Although it’s debatable whether this technology should be made accessible, bad actors will always find a way regardless. Read more
OpenTPOD
Another tool that tries to make deep learning applications accessible. OpenTPOD makes it easy to do object detection without having to learn deep learning. Read More
Eighth International Conference on Learning Representations
The videos from the ICLR 2020 are up. Watch here
ALMa: Active Learning (data) Manager
An interesting tool to manage the bookkeeping when labeling data sets. Read more
25 Hot New Data Tools and What They DON’T Do
With all the new tools that are popping up, it makes sense to be able to distinguish between they do and they don’t. Some of these tools are in closed beta, while others like dbt has had decent adoption. Read more
A Visual Survey of Data Augmentation in NLP
While data augmentation is heavily used in image data sets, it isn’t as common to hear about it applied to NLP. This articles shows us some of the approaches used for data augmentation in the NLP content in an accessible manner. Read more
Managing Data Science Teams
Product for Internal Platforms
It’s interesting to see how products are managed to serve internal customers. This is pretty similar to the set up for data scientists serving internal customers. Key takeaway is to really be customer focused and partner with the teams you are serving. Read more
Just some red flags. No big deal. Just ignore them.
Onboarding is how your employees will experience their first few days in the company and sets the tone for their first few weeks in the company. This article shows how onboarding can go wrong and I found the author’s experience pretty relatable. It’s a good reminder of why we should care about the onboarding experience. Read more
Build a Diverse Team to Solve the AI Riddle
More diverse teams are supposed to perform better, but does it makes sense in our context? As the article points out, yes. In this case, hiring English majors for an NLP project perfectly makes sense and collaborating with a linguist enables NLP teams to have a domain expert to deal with edge cases that are normally not considered. Read more
When “Grin and Bear It” Isn't the Right Answer — This Behavioral Scientist Shares What to Do Instead
COVID-19 has had a negative effect to people’s mental health and it’s no surprise that these mental health issues can carry over to our work. This article tells us that we need a combined strategy of acceptance and distraction to cope with the growing mental health problem. Read more
The Key to Building a Successful Remote Organization? Data.
As companies are forced to work remotely, context becomes much more important as modes of communications change. In person interactions have been reduced and so organizations are now more dependent on data for decision making. Making this data available can help bridge the gap in context and allow people to be more autonomous in decision making. Read more
Hope you enjoyed reading these hand picked articles.
If you liked this issue and would like more in the future, subscribe below:
If you know anyone who would benefit from this, whether it’s a friend, a colleague or even your boss, please share this to them.
