Formatting JSON

Examples on this page use Python.

Formatting JSON to send
- Dates to send
- Format existing JSON
Formatting json downloaded from Figshare API

Formatting JSON to send

Authors

Format authors as part of creating a new item.

import json
#assume you have a list of json formatted records which includes the first and last names labeled as 'first_name' and 'last_name'.
authors = []
for item in json_metadata:
	for name in item['authors']:
		authorname = {"name" : name['first_name'] + " " + name['last_name']}
		authors.append(authorname)

Dates to send

Format a partial date for use in Figshare (partial date support is on the roadmap)

#assume you have created a dictionary called my_dict that holds the Figshare formatted metadata and the harvested metadata only includes a year. This makes the date January 1 of that year.

my_dict['timeline'] =  {"firstOnline" : str(item['year']) + "-01-01"} #year only

Format existing JSON

Here is an example of formatting existing json metadata for upload to Figshare:

existing_metadata = [
    {"record_id":123,
     "title":"blah blah blah",
     "authors": [{"first_name":"Allisa", "last_name":"Morris"},{"first_name":"Alex", "last_name":"Dannold"}],
     "doi":"https://doi.org/10.38743/384738"
    },
    {"record_id":456,
     "title":"yadda yadda",
     "authors": [{"first_name":"Andrew", "last_name":"Yol"},{"first_name":"Rebecca", "last_name":"Jane"}],
     "doi":"https://doi.org/10.29033/6575"}
]

formatted_metadata = []
for record in existing_metadata:
    record_dict = {}
    record_dict['id'] = record['record_id']
    record_dict['title'] = record['title']
    #Format authors
    authors = []
    for name in record['authors']:
        authorname = {"name" : name['first_name'] + " " + name['last_name']}
        authors.append(authorname)
    record_dict['authors'] = authors
    #Format doi to just the doi vaalue and not URL
    doi = str(record['doi'])[16:]
    record_dict['doi'] = doi
    #Add record to the growing list
    formatted_metadata.append(record_dict)
    
#See formatted metadata:
formatted_metadata

Output:

[{'id': 123,
  'title': 'blah blah blah',
  'authors': [{'name': 'Allisa Morris'}, {'name': 'Alex Dannold'}],
  'doi': '10.38743/384738'},
 {'id': 456,
  'title': 'yadda yadda',
  'authors': [{'name': 'Andrew Yol'}, {'name': 'Rebecca Jane'}],
  'doi': '10.29033/6575'}]

Formatting JSON downloaded from Figshare API

Dates for retrieved information

If you downloaded metadata and want to format the dates in a dataframe:

import pandas as pd

#This assumes you have a dictionary called item_metadata and a dataframe of that metadata called df.

#The dates are all contained within one column called 'timeline'. Flatten that column and associate the values with the proper article id in a new dataframe

temp_date_list = []

for item in item_metadata:
    dateitem = item['timeline']
    dateitem['id'] = item['id']
    temp_date_list.append(dateitem)

df_dates = pd.json_normalize(
    temp_date_list 
)

#Merge the dataframes
df_formatted = df.merge(df_dates, how='outer', on='id')

print("Dates split out and merged")

Group information

In the downloaded metadata, the group is identified with its id. If you are a repository administrator use this script to retrieve the group names and merge them with your dataframe of metadata. In this example, the df_formatted dataframe is the downloaded metadata and must have a group_id column. Groups can be nested so this script will add the group name matching the group id in the downloaded metadata as well as the parent group.

import pandas as pd

#Get list of groups.
s=requests.get(BASE_URL + '/account/institution/groups', headers=api_call_headers)
groups=json.loads(s.text)

#Create a dataframe of groups
df_groups = pd.json_normalize(groups)

df_groups_parent = df_groups[['id','name']] #Create reference dataframe
df_groups = df_groups.rename(columns={'id': 'group_id','name': 'group_name'}) #Rename id col in main dataframe
df_groups_parent = df_groups_parent.rename(columns={'name': 'parent_group_name'}) #Rename name col in reference dataframe

df_groups = df_groups.sort_values(by=['parent_id'])
top_group_id = df_groups.iloc[0]['group_id'] #Store the group id for top group

df_groups.loc[df_groups['parent_id'] == 0, 'parent_id'] = top_group_id #For top level group, replace the zero value parent id with top level group id

df_groups = df_groups.merge(df_groups_parent, how='inner',left_on=['parent_id'], right_on=['id']) #Add parent group name

df_groups = df_groups[['group_id','group_name','parent_group_name']] #Pare down to needed columns


#Merge the dataframes
df_formatted = df_formatted.merge(df_groups, how='inner', on='group_id') #If you use 'outer' it will include a blank record for each group with no records

print("Names for",len(df_groups),"different groups were added to the metadata records")

Custom fields

The custom field information is stored as a JSON dictionary. The following script converts the JSON to columns in a dataframe which is then merged back into the main metadata dataframe. The metadata dataframe, df_formatted, needs to have the item id column named ‘id’.

import pandas as pd

#The custom fields are all contained within one column called 'custom_fields'. Flatten that column and associate the values
#with the proper article id in a new dataframe
custom = pd.json_normalize(
    item_metadata,
    record_path =['custom_fields'],
    meta=['id']
)

if len(custom) > 0:
    #This reshapes the data so that metadata field names are columns and each row is an id.
    custom = custom.pivot(index="id", columns="name", values="value")

    #Merge the dataframes so that all the custom fields are visible along with all the other metadata
    df_formatted = df_formatted.merge(custom, how='outer', on='id') #Outer merge keeps records that have no custom metadata.
    print('Custom fields split out and merged')
else:
    print('No custom fields to split out')