leggingsfortheworld











This is my very first post written as an ipython notebook.
Thanks to my dear friends at leggingsfortheworld.com,
I obtained their entire customer purchase database.

Here is a quick exploratory data analysis,
mainly to practice some python skills:

In [25]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display, HTML

%matplotlib inline
color = sns.color_palette()

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

from subprocess import check_output
#print(check_output(["ls", "../input"]).decode("utf8"))

# Any results you write to the current directory are saved as output.
In [26]:
df = pd.read_csv('dataSmall.csv',encoding="ISO-8859-1",dtype={'CustomerID': str,'InvoiceID': str})
df.InvoiceDate = pd.to_datetime(df.InvoiceDate, format="%m/%d/%Y %H:%M")
df.info() 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8678 entries, 0 to 8677
Data columns (total 8 columns):
InvoiceNo      8678 non-null object
StockCode      8678 non-null object
Description    8652 non-null object
Quantity       8678 non-null int64
InvoiceDate    8678 non-null datetime64[ns]
UnitPrice      8678 non-null float64
CustomerID     6518 non-null object
Country        8678 non-null object
dtypes: datetime64[ns](1), float64(1), int64(1), object(5)
memory usage: 542.5+ KB
In [27]:
order_canceled = df['InvoiceNo'].apply(lambda x:int('C' in x))
n1 = order_canceled.sum()
n2 = df.shape[0]
print('Number of orders canceled: {}/{} ({:.2f}%) '.format(n1, n2, n1/n2*100))
Number of orders canceled: 185/8678 (2.13%) 

There is 209 transaction that have negative quantity or unitPrice

In [28]:
# Item sale with Quantity <=0 or unitPrice < 0
print (((df['Quantity'] <= 0) | (df['UnitPrice'] < 0)).value_counts())

#Delete the negative values 
df = df.loc[(df['Quantity'] > 0) | (df['UnitPrice'] >= 0)]
False    8469
True      209
dtype: int64
In [29]:
%%timeit
df['yearmonth'] = df['InvoiceDate'].apply(lambda x: (100*x.year) + x.month)
df['Week'] = df['InvoiceDate'].apply(lambda x: x.strftime('%W'))
df['day'] = df['InvoiceDate'].apply(lambda x: x.strftime('%d'))
df['Weekday'] = df['InvoiceDate'].apply(lambda x: x.strftime('%w'))
df['hour'] = df['InvoiceDate'].apply(lambda x: x.strftime('%H'))
335 ms ± 83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Number of Invoice

Number Invoice group by Month

In [30]:
plt.figure(figsize=(12,6))
plt.title("Frequency of order by Month", fontsize=15)
InvoiceDate = df.groupby(['InvoiceNo'])['yearmonth'].unique()
InvoiceDate.value_counts().sort_index().plot.bar()
Out[30]:
<matplotlib.axes._subplots.AxesSubplot at 0x28b6ac04710>
In [31]:
# 2010-12 Dataset end at
df.loc[df['yearmonth'] == 201012]['InvoiceDate'].max()
Out[31]:
NaT
In [32]:
# 2011-12 Dataset end at
df.loc[df['yearmonth'] == 201112]['InvoiceDate'].max()
Out[32]:
NaT

Number Invoice group by Week

In [33]:
plt.figure(figsize=(12,6))
plt.title("Frequency of order by Week", fontsize=15)
InvoiceDate = df.groupby(['InvoiceNo'])['Week'].unique()
InvoiceDate.value_counts().sort_index().plot.bar()
Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x28b6947dc50>

Number Invoice group by day

In [34]:
plt.figure(figsize=(12,6))
plt.title("Frequency of order by Day", fontsize=15)
InvoiceDate = df.groupby(['InvoiceNo'])['day'].unique()
InvoiceDate.value_counts().sort_index().plot.bar()
Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x28b692ce0b8>
In [35]:
grouped_df  = df.groupby(["day", "hour"])["InvoiceNo"].unique().reset_index()
grouped_df["InvoiceNo"] = grouped_df["InvoiceNo"].apply(len)

#grouped_df = df.groupby(["Weekday", "hour"])["InvoiceNo"].aggregate("count").reset_index()
grouped_df = grouped_df.pivot('day', 'hour', 'InvoiceNo')

plt.figure(figsize=(12,6))
sns.heatmap(grouped_df)
plt.title("Frequency of Day Vs Hour of day")
plt.show()

Number Invoice group by Weekday

[0] ~ Sunday

In [36]:
#[0] ~ Sunday
plt.figure(figsize=(12,6))
plt.title("Frequency of order by Weekday", fontsize=15)
InvoiceDate = df.groupby(['InvoiceNo'])['Weekday'].unique()
InvoiceDate.value_counts().sort_index().plot.bar()
Out[36]:
<matplotlib.axes._subplots.AxesSubplot at 0x28b692c4588>

Number Invoice group by Hour

In [37]:
plt.figure(figsize=(12,6))
plt.title("Frequency of order by hour of day", fontsize=15)
InvoiceDate = df.groupby(['InvoiceNo'])['hour'].unique()
(InvoiceDate.value_counts()).iloc[0:-1].sort_index().plot.bar()
Out[37]:
<matplotlib.axes._subplots.AxesSubplot at 0x28b69e4c3c8>

Frequency of Day of week Vs Hour of day

In [38]:
grouped_df  = df.groupby(["Weekday", "hour"])["InvoiceNo"].unique().reset_index()
grouped_df["InvoiceNo"] = grouped_df["InvoiceNo"].apply(len)

#grouped_df = df.groupby(["Weekday", "hour"])["InvoiceNo"].aggregate("count").reset_index()
grouped_df = grouped_df.pivot('Weekday', 'hour', 'InvoiceNo')

plt.figure(figsize=(12,6))
sns.heatmap(grouped_df)
plt.title("Frequency of Day of week Vs Hour of day")
plt.show()

Total Sales (First order vs Reorder)

In [39]:
df = pd.read_csv('dataSmall.csv',encoding="ISO-8859-1",dtype={'CustomerID': str,'InvoiceID': str})
df.InvoiceDate = pd.to_datetime(df.InvoiceDate, format="%m/%d/%Y %H:%M")

#remove the negative values and replace with nan
df.loc[df['Quantity'] <= 0, 'Quantity'] = np.nan
df.loc[df['UnitPrice'] < 0, 'UnitPrice'] = np.nan

df.dropna(inplace=True)

df['total_dollars'] = df['Quantity']*df['UnitPrice']

df['yearmonth'] = df['InvoiceDate'].apply(lambda x: (100*x.year) + x.month)
df['Week'] = df['InvoiceDate'].apply(lambda x: x.strftime('%W'))
df['day'] = df['InvoiceDate'].apply(lambda x: x.strftime('%d'))
df['Weekday'] = df['InvoiceDate'].apply(lambda x: x.strftime('%w'))
df['hour'] = df['InvoiceDate'].apply(lambda x: x.strftime('%H'))
In [40]:
#First Item Order
df_sort = df.sort_values(['CustomerID', 'StockCode', 'InvoiceDate'])
df_sort_shift1 = df_sort.shift(1)
df_sort_reorder = df_sort.copy()
df_sort_reorder['reorder'] = np.where(df_sort['StockCode'] == df_sort_shift1['StockCode'], 1,0)

Sales by Month

In [41]:
notreorder = (df_sort_reorder[df_sort_reorder['reorder'] == 0 ].groupby(['yearmonth'])['total_dollars'].sum())
reorder = (df_sort_reorder[df_sort_reorder['reorder'] == 1 ].groupby(['yearmonth'])['total_dollars'].sum())

yearmonth = pd.DataFrame([notreorder , reorder], index=['First Buy', 'Reorder']).transpose()

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(14,9))

yearmonth.plot.bar(stacked=True, ax=axes[0])
yearmonth.plot.box(ax=axes[1])
Out[41]:
<matplotlib.axes._subplots.AxesSubplot at 0x28b6c8d6d68>

Sales by Week

In [42]:
notreorder = (df_sort_reorder[df_sort_reorder['reorder'] == 0 ].groupby(['Week'])['total_dollars'].sum())
reorder = (df_sort_reorder[df_sort_reorder['reorder'] == 1 ].groupby(['Week'])['total_dollars'].sum())

yearmonth = pd.DataFrame([notreorder , reorder], index=['First Buy', 'Reorder']).transpose()

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(14,9))

yearmonth.plot.bar(stacked=True, ax=axes[0])
yearmonth.plot.box(ax=axes[1])
Out[42]:
<matplotlib.axes._subplots.AxesSubplot at 0x28b6cb37128>

Sales by Day

In [43]:
notreorder = (df_sort_reorder[df_sort_reorder['reorder'] == 0 ].groupby(['day'])['total_dollars'].sum())
reorder = (df_sort_reorder[df_sort_reorder['reorder'] == 1 ].groupby(['day'])['total_dollars'].sum())

yearmonth = pd.DataFrame([notreorder , reorder], index=['First Buy', 'Reorder']).transpose()

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(14,9))

yearmonth.plot.bar(stacked=True, ax=axes[0])
yearmonth.plot.box(ax=axes[1])
Out[43]:
<matplotlib.axes._subplots.AxesSubplot at 0x28b6ad34978>

Customizing map background in Tableau

Tableau Desktop includes a connection to Tableau’s map server, which provides an extensive selection of maps optimized for use with Tableau. If you prefer to use your own maps, the easiest approach is to connect to a map server that supports the WMS standard. For more information, go to Working with WMS Servers topic in Tableau Desktop Help and Mapping Data with WMS article.

Requirements for a TMS Connection

To connect to your map server from the TMS, your map server must have the following features:

  • Maps are returned as a collection of tiles
  • Tiles are in Web Mercator projection
  • Tiles can be addressed by URL using the same numbering scheme as common web mapping services. For more information, see the <url-format> section under Variables in the TMS File for more information.

Create a Simple TMS File

To connect to the TMS, you must create a TMS file. A TMS file is a simple text file that you can create in a text editor.

Open a text editor.Copy and paste the following XML into the text editor.

Copy and paste the following XML into the text editor.<?xml version="1.0" encoding="utf-8"?>
<mapsource inline="<boolean>" version="8.1">
<connection class="OpenStreetMap" port="80" server="<server-url>" url-format="<url-format>" />
<layers>
<layer display-name='Base' name='base' show-ui='false' type='features' request-string='/' />
</layers>
</mapsource>

Replace <boolean>, <server-url>, and <url-format> variables as described in the Required Variables in the TMS File section in this article.

Save the TMS file with a .tms extension to the Mapsources folder of Tableau Desktop or Tableau Server.The default location for the Mapsources folder:

For Tableau Desktop on the Mac – /Users/<user>/Documents/My Tableau Repository/Mapsources

For Tableau Desktop on Windows – C:\Users\<user>\Documents\My Tableau Repository\Mapsources

For Tableau Server – C:\Program Files\Tableau\Tableau Server\<version>\vizqlserver\mapsources

Open Tableau Desktop.

Connect to a workbook that contains location information.

Select Map > Background Maps, and then select the background map from the map server you configured in the TMS file.

(Optional) If you added the TMS file to the Mapsources folder in Tableau Server, publish the workbook to Tableau Server and see the background map you configured in the TMS file.

Required Variables in the TMS File

Only the following variables can be changed in the XML:

<boolean>: Replace the <boolean> with either a true or false value.

A true value allows Tableau Desktop to save the configuration specified in the TMS file with the workbook. Use this value if your workbook is being published to Tableau Online or Tableau Public.

A false value requires Tableau Desktop or Tableau Server to have access to the TMS file saved in the Mapsources folder to display the maps from your map server.

<server-url>: Replace <server-url> with the URL of your map server.

<url-format>: Replace <url-format> with additional URL fragments that your map server requires. This might include the following tags:

{Z}: The {Z} tag indicates the zoom level. A zoom level of 0 displays the entire world in one map tile. The TMS will fetch map tiles up to level 16.

{X} and {Y}: The {X} and {Y} tags indicate the map tile coordinates. For more information about map tiles, refer to the following web pages:

OpenStreetMaps wiki page

Bing Maps web page

OSM XML

Suppose you want to connect to a sample map server provided by OpenStreetMaps. The TMS file may look like the following:

<?xml version="1.0" encoding="utf-8"?>
<mapsource inline="true" version="8.1">
<connection class="OpenStreetMap" port="80" server="http://a.tile.openstreetmap.org" url-format="/{Z}/{X}/{Y}.png" />
<layers>
<layer display-name='Base' name='base' show-ui='false' type='features' request-string='/' />
</layers>
</mapsource>

Google maps XML

The TMS file for the google map tile server looks like the following:

<?xml version="1.0" encoding="utf-8"?>
<mapsource inline="true" version="8.1">
<connection class="OpenStreetMap" port="80" <connection class="OpenStreetMap" port="80" server="http://mt1.google.com" url-format="/vt/lyrs=m&amp;x={X}&amp;y={Y}&amp;z={Z}" />
<layers>
<layer display-name='Base' name='base' show-ui='false' type='features' request-string='/' />
</layers>
</mapsource>

Stamen Toner XML

Suppose you want to connect to a sample map server provided by OpenStreetMaps. The TMS file may look like the following:

<?xml version="1.0" encoding="utf-8"?>
<mapsource inline="true" version="8.1">
<connection class="OpenStreetMap" port="80" <connection class="OpenStreetMap" port="80"
server="http://tile.stamen.com" url-format="/toner/{Z}/{X}/{Y}.png" />
<layers>
<layer display-name='Base' name='base' show-ui='false' type='features' request-string='/' />
</layers>
</mapsource>

Stamen WaterColor XML

Suppose you want to connect to a sample map server provided by OpenStreetMaps. The TMS file may look like the following:

<?xml version="1.0" encoding="utf-8"?>
<mapsource inline="true" version="8.1">
<connection class="OpenStreetMap" port="80" <connection class="OpenStreetMap" port="80"
<connection class="OpenStreetMap" port="80" server="http://tile.stamen.com" url-format="/watercolor/{Z}/{X}/{Y}.jpg" />
<layers>
<layer display-name='Base' name='base' show-ui='false' type='features' request-string='/' />
</layers>
</mapsource>

 

Offline maps: local map tile server

As described previously (e.g. http://rgooglemaps.r-forge.r-project.org/OfflineMaps-RgoogleMaps-leaflets.html) we can use the RgoogleMaps package to (i) download map tiles and store them locally and (ii) launch a local Web server (in python or in R) to serve the map tiles to ANY mapping application.

To achieve this in Tableau, you would simply follow the instructions from the link above and then use e.g. this XML file:

<?xml version="1.0" encoding="utf-8"?>
<mapsource inline="true" version="8.1">
<connection class="OpenStreetMap" port="80" <connection class="OpenStreetMap" port="80"
<connection class="OpenStreetMap" port="80" server="
http:/localhost:8000" url-format="/mapTiles/watercolor/{Z}/{X}/{Y}.jpg" />
<layers>
<layer display-name='Base' name='base' show-ui='false' type='features' request-string='/' />
</layers>
</mapsource>