Mentions_df = pd.read_xml(url, stylesheet=mentions_xsl) Specifically, each stylesheet drills down to the most granular node and then by the ancestor axis pulls higher level information as sibling columns. No migration to dictionaries for JSON handling.Īs a special purpose language written in XML, XSLT can transform your nested structure to flatter format for migration to data frame. Since the URL really contains two data sections under each, specifically (which appear to be aggregate vote data) and (which are granular person-level data) ( pardon my French), consider building two separate data frames using the new IO method, pandas.read_xml, which supports XSLT 1.0 (via the third-party lxml package). csv (semi-colon separated) as a resentative sample of what the dataframe should look like Code du département Libellé du département Code de la commune Libellé de la commune Etat saisie Inscrits Abstentions % Abs/Ins Votants % Vot/Ins Blancs % Blancs/Ins % Blancs/Vot Nuls % Nuls/Ins % Nuls/Vot Exprimés % Exp/Ins % Exp/Vot N°Panneau Sexe Nom Prénom Voix % Voix/Ins % Voix/ExpĠ1 Ain 001 L'Abergement-Clémenciat Complet 645 108 16,74 537 83,26 16 2,48 2,98 1 0,16 0,19 520 80,62 96,83 1 F ARTHAUD Nathalie 3 0,47 0,58 2 M ROUSSEL Fabien 6 0,93 1,15 3 M MACRON Emmanuel 150 23,26 28,85 4 M LASSALLE Jean 18 2,79 3,46 5 F LE PEN Marine 149 23,10 28,65 6 M ZEMMOUR Éric 43 6,67 8,27 7 M MÉLENCHON Jean-Luc 66 10,23 12,69 8 F HIDALGO Anne 5 0,78 0,96 9 M JADOT Yannick 30 4,65 5,77 10 F PÉCRESSE Valérie 26 4,03 5,00 11 M POUTOU Philippe 3 0,47 0,58 12 M DUPONT-AIGNAN Nicolas 21 3,26 4,04 ![]() I pasted the headers and the first line in the form of a. The final dataframe should contain a lot of columns when fully flatten and is expected to be very close of the CSV below. Ideally the dataframe should look like ID's, names for geographic entities and vote results and names of election candidates. How can I improve the code to fully flatten the XML ? import pandas as pdĭoc = xmltodict.parse(fd.read(), encoding='ISO-8859-1', process_namespaces=False) ![]() Json_normalize documentation left me confused. The XML structure seems complex and is beyond my knowledge. I have downloaded the XML as a file and called it '058com.xml' and run the code below, though the last column of the resulting dataframe is a mess of data arranged as multiple OrderedDict. How can I convert this XML file at this address into a pandas dataframe?
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |