Data Science

Data Science core team lead at ZenBusiness

During the last year I was a Sr Data Analyst at ZenBusiness. As part of the Data Science team, I was asked to lead the core team, who was in charge of administering and supporting all reporting platforms. My team had to lay down guidelines and rules for various reporting platforms used by the company. We created a lot of documentation, training, and rules of engagement for those platforms. We also cleaned up and administered existing dashboards and reports. As a nimble member of the core team, I collaborated a lot with the marketing, product, and finance data science teams. I also became a go to person for data visualization for data science.

During my time at ZenBusiness I became very familiar with various dashboarding platforms, database managers, and visualization tools. I used mostly SQL for every day needs, although I sometimes also used python for specific analysis tasks. I also had to get familiarized with R in order to debug analysis code written by other team members. My close interaction with the Data engineering team lead to some interactions with dbt and airflow tools as well.

I was part of the DEI committee at ZenBusiness, which held bi-weekly meetings. As an active member of the committee, I prepared and presented various analysis about diversity in our company. I also served in the Grant Committee for one cycle, which picks grant winners among new entrepreneur ZenBusiness customers.

Step count analysis

I obtained the pedometer data from a phone to analyze walking habits. I used several python packages for analysis and plotting. I chose pandas for the time series analysis, but I couldn’t get all the flexibility I needed for plotting with seaborn, so I switched to old matplotlib for that. I realized that for elderly subjects it was necessary to filter data with a minimum number of steps per day in order to avoid cases when the phone was left at home or not carried around most of the day. The code can be found here.

My analysis

For my own analysis, I discovered that I have to pay particular attention to my step count on weekends. Those turn out to be the most stationary days for me. I also realized that all the days with maximum counts in the last few years are due to spending time with the same friend. It’s clear from the histograms and timeline that my step count improved after I started tracking steps consciously in 2018.

I was able to create equivalent plots for a few other subjects that kindly shared their data with me. Even though these are only a handful examples, one can see that walking habits vary a fair amount, but some things are common to all datasets: large peaks tend to happen in summer, during holidays; Winter and Sundays are when fewer steps are recorded for most subjects. From this small sample, it looks like reaching an average of 10,000 steps every day is an unrealistic goal, but a minor change of habits could make people reach 7,500 daily steps relatively easy… unless they are senior citizens, for whom a 5,000 steps goal should be feasible.

Experience and tools

As a professional astronomer I have had to use dozens of different types of software to deal with data coming from a variety of instruments and telescopes. Most of the systems can use scripts and are run via unix or specific GUIs. I have used a fair amount of awk for scripting through time, since it is so easy to combine it with other tools. I have coded using Fortran, IDL, and more recently Python. I plotted with supermongo for years, but eventually I had to move to IDL and now Python. I typed my thesis and every journal paper using LaTex, but of course the world demands the use of MS Word type editors. Even though my first presentations as a student were done with projector transparencies, I have used MS Powerpoint type editors for decades. The same goes for various types of Spreadsheets.

In the last few months, I have trained myself in SQL and Machine Learning techniques (mostly within python). When accessing astronomical databases, I have used SQL, since they only recently moved to using it as a standard.

I am comfortable picking up new tools as needed since I have been doing this all my life.