2 ###############################################################################
7 mig - git-like program for tracking and manipulating legacy data files for
12 B<mig> <command> [argument] [...]
16 B<mig> is used to track and manipulate CSV or CSV-like text files exported from
17 legacy systems for migration into Evergreen. It can be a wrapper for some
18 other migration tools and tracks state using a PostgreSQL table in a given
21 It makes use of certain environment variables that may be set by the B<mig-env>
22 tool: PGHOST, PGPORT, PGUSER, PGDATABASE, MIGSCHEMA, and MIGWORKDIR
24 For most commands, if the current working directory falls outside of the
25 directory specified by MIGWORKDIR, then mig will assume that environment is
26 also incorrect and bail before doing any actual work.
28 ~/.pgpass should also be configured, as B<mig> will not prompt for a database
31 Only the B<env> and B<help> commands work without the MIGSCHEMA environment
36 Using B<mig> should go something like this:
40 =item mig env create m_foo # Sets up the environment
42 =item mig env use m_foo # Spawns a shell using the configured environment
44 =item mig init # creates the m_foo schema in the database if needed, and other tables
46 =item mig add patrons.tsv # tracks an incoming data file; repeat for additional files
48 =item mig iconv patrons.tsv # convert it to UTF8, creating patrons.tsv.utf8
50 =item mig clean patrons.tsv # cleans the file, creating patrons.tsv.utf8.clean
52 =item mig link patrons.tsv actor_usr # makes the soon-to-be staging table a child of m_foo.actor_usr
54 =item mig convert patrons.tsv # creates a .sql file for staging the data
56 =item mig stage patrons.tsv # load said .sql file
58 =item mig mapper patrons.tsv # interactive tool for analyzing/mapping the staging table
60 =item mig analysis patrons.tsv # writes a summary .tsv file of mapped/flagged fields from the staging table
62 =item mig map patrons.tsv # apply configured mappings
64 =item mig write_prod patrons.tsv # creates a .sql file for pushing the staging data into production
72 =item B<help> [command]
74 Display this very same documentation, or specific documentation for one of the
77 =item B<env> <create|use|show> <schema>
79 Invokes B<mig-env> with the same arguments. I<mig-env> can set important
80 environment variables and spawn a shell with those variables, and it also does
81 some directory creation and symlinking.
85 Create or re-create the PostgreSQL tracking table for the schema specified by
86 the MIGSCHEMA environment variable. If needed, create the migration schema
87 itself and run migration_tools.init() and build() if the migration_tools schema
90 =item B<status> [file] [...]
92 Show status information for either the specified files or all tracked files if
95 =item B<add> [--no-headers|--headers] <file> [file|--no-headers|--headers] [...]
97 Add the specified files to the migration tracker. Until --no-headers is
98 specified, the tracker will assume the files have headers.
100 You can do crazy stuff like
101 B<mig add file1 --no-headers file2 file3 --headers file4>
103 =item B<remove> <file> [file] [...]
105 Remove the specified files from the migration tracker.
107 =item B<iconv> <file> [other arguments...]
109 Attempts to invoke B<iconv> on the specified tracked file, placing the output in
112 If given no other arguments, the invocation will lool like
116 iconv -f ISO-8859-1 -t UTF-8 -o <file>.utf8 <file>
120 otherwise, the arguments will be passed through like so
124 iconv [other arguments...] -o <file>.utf8 <file>
128 =item B<skip-iconv> <file>
130 If this is used instead of B<iconv>, then B<mig> will look for an existing
131 <file>.utf8 and use it instead of attempting to create one.
133 =item B<clean> <file> [other arguments...]
135 Attempts to invoke B<clean_csv> on the iconv-converted specified tracked file,
136 placing the output in <file>.utf8.clean
138 If given no other arguments, the invocation will lool like
142 clean_csv --config scripts/clean.conf --fix --apply <--create-headers> <file>
146 otherwise, the arguments will be passed through like so
150 clean_csv [other arguments...] <file>
154 =item B<skip-clean> <file>
156 If this is used instead of B<clean>, then B<mig> will look for an existing
157 <file>.utf8.clean and use it instead of attempting to create one.
159 =item B<link> <file> <parent table>
161 Associate the specified file with a parent table within the migration schema.
163 Linking multiple files to the same parent table is not allowed currently.
165 =item B<unlink> <file>
167 Removes any association between the specified file and a parent table within
168 the migration schema.
170 =item B<convert> <file>
172 Attempts to invoke B<csv2sql> on the .utf8.clean version of the specified
173 tracked file, creating either [file].utf8.clean.stage.sql or
174 <parent table>_stage.sql depending on whether the file has been linked to a
175 parent table within the migration schema or not.
177 If given no other arguments, the invocation will lool like
181 csv2sql --config scripts/clean.conf --add-x-migrate --schema <MIGSCHEMA> [--parent <PARENT TABLE>] -o <[<file>.utf8.clean.stage.sql]|[parent_table_stage.sql]> <FILE>.utf8.clean
185 otherwise, the arguments will be passed through like so
189 csv2sql [other arguments...] -o <[<file>.utf8.clean.stage.sql]|[parent_table_stage.sql]> <file>.utf8.clean
193 =item B<stage> <file> [other arguments...]
195 Load the SQL-converted version of the specified file into the migration schema.
197 Extra arguments are passed to the underlying call to psql
199 =item B<mapper> <file>
201 Interactive session for analyzing, flagging, and mapping legacy field data to
204 Upon exit, generate either [file].clean.map.sql or <parent table>_map.sql. The
205 SQL generated will be UPDATE's for setting the Evergreen-specific columns for a
206 given file's staging tables, and TRUNCATE's and INSERT's for auxilary tables.
207 The files will have \include hooks for pulling in additional mapping files
208 (for example, end-user mappings for circ modifiers, etc.)
210 =item B<analysis> [file]
212 Writes a MIGSCHEMA.tsv file containing a break-down of mapped and flagged
213 fields from the specified file, or all staged files if no file is specified.
215 The main goal of the tsv file is to present end-user mappable data for circ
216 modifiers, shelving locations, patron profiles, etc. We use spreadsheets for
217 this now but may move to a dedicated UI in the future.
221 Applies the mapping sql to the migration schema for the specified mapped file,
222 or for all mapped files if no file is specified.
224 =item B<write_prod> [file]
226 Generates <parent table>_prod.sql for the specified linked and mapped file, or
227 all such files if no file is specified.
229 =item B<sql> [arguments...]
231 A wrapper around the psql command. At some point the plan is to shove mig-tracked variables into psql sessions.
237 ###############################################################################
242 HOME PGHOST PGPORT PGUSER PGDATABASE MIGSCHEMA
243 MIGBASEWORKDIR MIGBASEGITDIR MIGGITDIR MIGWORKDIR
247 my $mig_bin = "$FindBin::Bin/mig-bin/";
248 use lib "$FindBin::Bin/mig-bin";
251 pod2usage(-verbose => 2) if ! $ARGV[0];
254 if (defined $ARGV[1]) {
255 my $cmd = $mig_bin . "mig-$ARGV[1]";
257 system( $mig_bin . "mig-$ARGV[1]", '--help' );
259 pod2usage(-verbose => 2);
262 pod2usage(-verbose => 2);
266 standard_invocation(@ARGV);
269 Mig::die_if_no_env_migschema();
270 standard_invocation(@ARGV);
273 Mig::die_if_no_env_migschema();
274 standard_invocation(@ARGV);
277 Mig::die_if_no_env_migschema();
278 standard_invocation(@ARGV);
281 Mig::die_if_no_env_migschema();
282 standard_invocation(@ARGV);
285 Mig::die_if_no_env_migschema();
286 standard_invocation(@ARGV);
289 Mig::die_if_no_env_migschema();
290 standard_invocation(@ARGV);
293 Mig::die_if_no_env_migschema();
294 standard_invocation(@ARGV);
297 Mig::die_if_no_env_migschema();
298 standard_invocation(@ARGV);
301 Mig::die_if_no_env_migschema();
302 standard_invocation(@ARGV);
305 Mig::die_if_no_env_migschema();
306 standard_invocation(@ARGV);
309 Mig::die_if_no_env_migschema();
310 standard_invocation(@ARGV);
313 Mig::die_if_no_env_migschema();
314 standard_invocation(@ARGV);
317 Mig::die_if_no_env_migschema();
318 standard_invocation(@ARGV);
321 Mig::die_if_no_env_migschema();
322 standard_invocation(@ARGV);
325 Mig::die_if_no_env_migschema();
326 standard_invocation(@ARGV);
329 Mig::die_if_no_env_migschema();
330 standard_invocation(@ARGV);
333 Mig::die_if_no_env_migschema();
336 Mig::die_if_no_env_migschema();
339 print "$MIGWORKDIR\n";
342 print "$MIGBASEGITDIR\n";
345 print "$MIGGITDIR\n";
352 sub standard_invocation {
354 system( $mig_bin . "mig-$cmd", @_ );