Clean messy data with OpenRefine

#data-manipulation

While not new, OpenRefine (previously Google Refine) is an insanely powerful tool to do research and work with messy data. It helps you clean and transform and can even connect with web service and external data sources.

Free, open-source and available for Mac, Windows, and Linux.

OpenRefine

The Life of a Data Byte

#data-storage

Did you know NASA's Apollo 11 mission used a type of storage called "rope memory"? It would take a technician a few minutes to "weave" just one byte! Not to mention it could only hold 72 kilobytes of memory total. Learn about how physical data storage has changed over time in this well-written and fascinating article. (20-minute read)

The Life of a Data Byte (Ramblings from Jessie)

COVID-19

Since the COV-19 pandemic is still very much in our collective attention (and possibly even ramping up), I’m going to keep this callout going. The more data we share about this outbreak, the better future generations will be prepared.

COVID-19 Cases by US State and County (Microsoft Power BI) - This is one of the most helpful interactive map-based dashboards I’ve seen. You can filter down to the county-level and even see numbers based on the population. The accompanying Towards Data Science article (Medium) by Shankar Lakshmanan is also worth a read.

COVID-19 Kaggle community contributions (Kaggle) - The Kaggle community produced some interesting insights from 44,000 research papers as part of their weekly machine learning and data analysis competition.