If you are seeking a career in data science or analytics, then you’ve undoubtedly already begun to familiarize yourself with SQL (structured query language) and its many functions. Whether or not you are well versed in SQL at this point, it is important that your skillset be expanded and strengthened.
Understanding the dataset
Before you can analyze or manipulate the data, you first need to understand what it is. Counting how many tweets Elon Musk has posted in the last 24 hours, or generating a list of popular websites in Italy, requires some understanding and knowledge of the dataset. Here are a few go-to queries:
Counting rows and items
One of the most frequently used aggregate functions in SQL is COUNT(). COUNT() takes one argument, i.e., the column name whose values need to be counted.
select count(*) from myTable;
SUM() function is used to calculate the sum of a numerical column.
select sum(myRow) as total_sum from myTable;
To calculate the average, we use AVG().
select avg(myRow) as total_average from myTable;
The DISTINCT keyword in SQL is used to remove duplicate data and keep the first occurrence of each value.
select disctinct myRow from myTable;
Understanding highs and lows
Next, it is helpful to understand the various values in your dataset. Are they similar? To they differ? Do they make sense at all? Here are a few queries to help you get started:
Calculate standard deviation
The standard deviation can give you a better idea of the variation of the values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.
select stddev(myRow) as std_dev from myTable;
Extreme value identification
Finding the min and max will further help you understand how much the data varies, both for given rows, but also for the data overall.
select max(myRow) as max_spend from myTable;
select min(myRow) as min_spend from myTable;
Sorting and limiting
Lastly, to help us put our data into perspective, we can sort and limit. Here is how you can do so in SQL:
This query selects the first five rows in our table. LIMIT is used when dealing with large amounts of data and when you'd like to see what the intial values look like.
select * from myTable limit 10;
If you just want to select only the fourth and fifth rows, we can use the OFFSET clause to skip 2 rows.
select * from myTable limit 10 offset 2;
We can also sort our data into ascending and descending order by using the keyword order by. In addition, we can use asc and desc to accurately sort.
select * from myTable order by myRow;
// We can also do it in ascending and descending order
select * from myTable order by myRow desc;
select * from myTable order by myRow asc;
I hope this blogpost is a helpful first step to help you work with your data in SQL. In Arctype, you can make sure to save your queries if you find yourself running them often. Simply create a new query and make sure to save it (see below)!