#!/usr/bin/perl -w ############################################################################### =pod =head1 NAME kmig - git-like program for tracking and manipulating legacy data files for migrations. This variant of and is geared toward the Koha ILS and MySql/MariaDB. =head1 SYNOPSIS B [argument] [...] =head1 DESCRIPTION B is used to track and manipulate CSV or CSV-like text files exported from legacy systems for migration into Evergreen. It can be a wrapper for some other migration tools and tracks state using a MySQL table for a given database. For most commands, if the current working directory falls outside of the directory specified by MIGWORKDIR, then kmig will assume that environment is also incorrect and bail before doing any actual work. Only the B and B commands work without the MIGDATABASE environment variable being set. =head1 OVERVIEW Using B should go something like this: =over 15 =item kmig env create m_foo # Sets up the environment =item kmig env use m_foo # Spawns a shell using the configured environment =item kmig init # creates any needed auxilary tables =item kmig add patrons.tsv # tracks an incoming data file; repeat for additional files =item kmig iconv patrons.tsv # convert it to UTF8, creating patrons.tsv.utf8 =item kmig clean patrons.tsv # cleans the file, creating patrons.tsv.utf8.clean =item kmig link patrons.tsv borrowers # models the soon-to-be staging table after table 'borrowers' =item kmig convert patrons.tsv # creates a .sql file for staging the data =item kmig stage patrons.tsv # load said .sql file =item kmig mapper patrons.tsv # interactive tool for analyzing/mapping the staging table =item kmig analysis patrons.tsv # writes a summary .tsv file of mapped/flagged fields from the staging table =item kmig map patrons.tsv # apply configured mappings =item kmig write_prod patrons.tsv # creates a .sql file for pushing the staging data into production =item kmig reporter --analyst "Foo Fooer" --report_title "Foo Load Analysis" #creates an asciidoc report =item kmig gsheet --pull foo_tab_name OR --push foo_pg_table_name =item kmig stagebibs --file foo.xml =back =head1 COMMANDS =over 15 =item B [command] Display this very same documentation, or specific documentation for one of the commands listed here. =item B Invokes B with the same arguments. I can set important environment variables and spawn a shell with those variables, and it also does some directory creation and symlinking. =item B Create or re-create the PostgreSQL tracking table for the schema specified by the MIGDATABASE environment variable. If needed, create the migration schema itself and run migration_tools.init() and build() if the migration_tools schema exists. =item B [file] [...] Show status information for either the specified files or all tracked files if no argument is given. =item B [--no-headers|--headers] [file|--no-headers|--headers] [...] Add the specified files to the migration tracker. Until --no-headers is specified, the tracker will assume the files have headers. You can do crazy stuff like B =item B [file] [...] Remove the specified files from the migration tracker. =item B [other arguments...] Attempts to invoke B on the specified tracked file, placing the output in .utf8 If given no other arguments, the invocation will lool like =over 5 iconv -f ISO-8859-1 -t UTF-8 -o .utf8 =back otherwise, the arguments will be passed through like so =over 5 iconv [other arguments...] -o .utf8 =back =item B If this is used instead of B, then B will look for an existing .utf8 and use it instead of attempting to create one. =item B [other arguments...] Attempts to invoke B on the iconv-converted specified tracked file, placing the output in .utf8.clean If given no other arguments, the invocation will lool like =over 5 clean_csv --config scripts/clean.conf --fix --apply <--create-headers> =back otherwise, the arguments will be passed through like so =over 5 clean_csv [other arguments...] =back =item B If this is used instead of B, then B will look for an existing .utf8.clean and use it instead of attempting to create one. =item B Associate the specified file with a parent table within the migration schema. Linking multiple files to the same parent table is not allowed currently. =item B Removes any association between the specified file and a parent table within the migration schema. =item B Attempts to invoke B on the .utf8.clean version of the specified tracked file, creating either [file].utf8.clean.stage.sql or _stage.sql depending on whether the file has been linked to a parent table within the migration schema or not. If given no other arguments, the invocation will lool like =over 5 csv2sql --config scripts/clean.conf --add-x-migrate --schema [--parent ] -o <[.utf8.clean.stage.sql]|[parent_table_stage.sql]> .utf8.clean =back otherwise, the arguments will be passed through like so =over 5 csv2sql [other arguments...] -o <[.utf8.clean.stage.sql]|[parent_table_stage.sql]> .utf8.clean =back =item B [other arguments...] Load the SQL-converted version of the specified file into the migration schema. Extra arguments are passed to the underlying call to psql =item B Interactive session for analyzing, flagging, and mapping legacy field data to Evergreen fields. Upon exit, generate either [file].clean.map.sql or _map.sql. The SQL generated will be UPDATE's for setting the Evergreen-specific columns for a given file's staging tables, and TRUNCATE's and INSERT's for auxilary tables. The files will have \include hooks for pulling in additional mapping files (for example, end-user mappings for circ modifiers, etc.) =item B [file] Writes a MIGDATABASE.tsv file containing a break-down of mapped and flagged fields from the specified file, or all staged files if no file is specified. The main goal of the tsv file is to present end-user mappable data for circ modifiers, shelving locations, patron profiles, etc. We use spreadsheets for this now but may move to a dedicated UI in the future. =item B [file] Applies the mapping sql to the migration schema for the specified mapped file, or for all mapped files if no file is specified. =item B [file] Generates _prod.sql for the specified linked and mapped file, or all such files if no file is specified. =item B [arguments...] A wrapper around the psql command. At some point the plan is to shove mig-tracked variables into psql sessions. =item B --analyst "Analyst Name" --report_title "Report Title" Generates an asciidoc file in the git working directory that can be converted to any appropriate format. The analyst and report parameters are required. Optional parameters are : --added_page_title and --added_page_file If one is used both must be. The added page file can be plain text or asciidoc. This adds an extra arbitrary page of notes to the report. Mig assumes the page file is in the kmig git directory. --tags This will define a set of tags to use, if not set it will default to Circs, Holds, Actors, Bibs, Assets & Money. --debug Gives more information about what is happening. --reports_xml Allows you to override the default evergreen_staged_report.xml in the mig-xml folder. =item B --pull or --push spreadsheet_tab This uses the gsheet_tracked_table and gsheet_tracked column tables to map a Google Docs Spreadsheet tabs with Postgres tables in the kmig schema. The spreadsheet is assumed to share the name as the kmig schema. Tab names must be unique. Each spreadsheet column needs a header that matches the column name in the matching table. An oauth session key is also needed for your Google account and kmig gsheet will look for it in the .kmig directory. =back =cut ############################################################################### use strict; use Switch; use Env qw( HOME PGHOST PGPORT PGUSER PGDATABASE MIGDATABASE MIGBASEWORKDIR MIGBASEGITDIR MIGGITDIR MIGWORKDIR ); use Pod::Usage; use FindBin; my $mig_bin = "$FindBin::Bin/kmig.d/bin/"; use lib "$FindBin::Bin/kmig.d/bin"; use KMig; pod2usage(-verbose => 2) if ! $ARGV[0]; switch($ARGV[0]) { case "help" { if (defined $ARGV[1]) { my $cmd = $mig_bin . "mig-$ARGV[1]"; if (-e $cmd) { system( $mig_bin . "mig-$ARGV[1]", '--help' ); } else { pod2usage(-verbose => 2); } } else { pod2usage(-verbose => 2); } } case "map" { } case "load" { } case "wdir" { print "$MIGWORKDIR\n"; } case "gdir" { print "$MIGBASEGITDIR\n"; } case "sdir" { print "$MIGGITDIR\n"; } else { standard_invocation(@ARGV); } } sub standard_invocation { my $cmd = shift; if ($cmd ne 'env') { Mig::die_if_no_env_migschema(); } if (-e $mig_bin . "kmig-$cmd") { system( $mig_bin . "kmig-$cmd", @_ ); } else { system( "kmig-$cmd", @_ ) == 0 or die pod2usage(1); } }