Develop a python script (can't be cmd driven) that enables me to quickly cleanse and transform datasets of varying sizes for use in other analytics systems. Using a Jupyter Notebook I want to import complex datasets and wrangle them for use in virtually any target system.
Key capabilities include:
- Import from flat file
- Locate and remove or modify missing or mismatched data
- Unnest complex data structures
- Identify statistical outliers in your data for review and management
- Perform lookups from one dataset into another reference dataset
- Aggregate columnar data using a variety of aggregation functions
- Merge datasets with joins
- Append one dataset to another through union operations
This is not intended to be a web app of any kind. There is really no front-end to speak of... I simply want to be able to interact with the Jupyter Notebook to pull all this off.
In general, the flow is as follows:
1. Import data: Integrate data from a variety of sources of data.
2. Profile our data: Before, during, and after we transform our data, we can use the visual profiling tools to quickly analyze and make decisions about your data.
3. Build transform recipes: Use the various views in the Transformers to build our transform recipes and preview the results on sampled data.
4. Generate Results: Launch a task to run our recipe on the full dataset. Review results and iterate as needed.
5. Export results: Export the generated results data for use outside of the script running in Jupyter Notebook.
Walking through the above, you will have noticed that we imported, cleansed, transformed, and possibly enhanced our data for use in the next step of our analytics pipeline.
Here are the greater details of what we are expecting as part of this solution:
We expect that most of the functions contained within Pandas will suffice for what we need.
However, each column within in an imported Pandas dataframe needs to have all the below available to be applied to it should a user decide to select it:
^^^Please See Uploaded Document for More Details^^^
Hello! Any manipulations done in Jupyter Notebooks are part of my day job as a bioinformatics analyst.
Relevant Skills and Experience
Python, Jupyter, Data processing
Proposed Milestones
$294 USD - All
I have a good experience on working with Advanced R and Python. I have quite a good knowledge of Deep learning and ML Algorithm , have also developed dashboards and Shiny Web Application.
Relevant Skills and Experience
I understand the project requirement and will deliver the desired product within the time specified.
Proposed Milestones
$155 USD - milestone
I am python expert with data analytics.
Relevant Skills and Experience
Python, Jupyter notebbok, Excel, Data Processing
Proposed Milestones
$133 USD - full task